English comparative correlative constructions: A usage-based account

English Comparative Correlatives (CCs) consist of two clauses, C1 and C2: [The more we get together,]C1 [the happier we’ll be.]C2 Recently, large corpus studies based on the Corpus of Contemporary American English have unearthed various meso-constructions in English CCs using covarying–collexeme analysis. The present study tests these findings against data from the British National Corpus (BNC), aiming to replicate previous results against data from another standard variety of English (British English) and a corpus that is sampled from a wider range of registers. Over 2,000 CC tokens from the BNC were analyzed with regard to hypotactic features, filler types encountered as comparative elements, and deletion phenomena. Moreover, in contrast to earlier corpus studies (such as Hoffmann, Thomas, Jakob Horsch, and Thomas Brunner. 2019. “The more data, the better: a usage-based account of the English comparative correlative construction.” Cognitive Linguistics 30(1): 1–36), the present study also investigates the frequency of the semantically related C2C1 construction (You will be the happierC2, the more we get togetherC1) that previously has been found to be considerably less frequent than its counterpart. The results of the present analysis confirm that English CCs possess more paratactic than hypotactic features and, supporting most of the findings of Hoffmann, Horsch, and Brunner (2019) provide even stronger evidence for the existence of several symmetric meso-constructions.

latter, we advocate a usage-based constructional account that builds on Culicover and Jackendoff's (1999) analysis.
Construction Grammar (cf., Croft 2001;Goldberg 2006;Bybee 2010;Hoffmann and Trousdale 2013) maintains that the basic unit of language is constructions, that is, pairings of FORM (which can include phonological, morphological as well as syntactic information) and MEANING (which can contain semantic, pragmatic as well as discourse-functional information; Croft and Cruse 2004: 268). Both Sag (2010) and Culicover and Jackendoff (1999) offer a constructional analysis of CCs, since their templates combine a FORM pole with a detailed semantic MEANING pole. However, both their analyses adopt a complete inheritance approach (see Hoffmann 2017a: 323-4) in that they only postulate the least number of constructions/constraints necessary to model English CCs. In contrast to this, we endorse a usage-based construction grammar approach to English CCs (Hoffmann 2014a, 2014b, 2018, 2019. Usage-based approaches (Bybee 2006(Bybee , 2010 assume that the mental grammar of speakers is "shaped by the repeated exposure to specific utterances" (Hoffmann 2018: 184). In other words, if a pairing of FORM and MEANING (i.e., construction) is encountered by a speaker frequently enough, it will become stored, or entrenched (cf. Croft and Cruse 2004: 276-8), even if it could be licensed by more abstract constructions (i.e., generalized, abstract patterns such as (2) that can be used to produce a great number of instances of a construction). Moreover, usage-based analyses emphasize the role that authentic data play for the input for speakers' generalizations (see also Croft 2001;Barðdal 2008Barðdal , 2011Hoffmann 2019: 9-16). As Croft (2001) and Barðdal (2008) note, the input that speakers are exposed to does not always automatically lead to maximally abstract mental generalizations but can also lead to only partly schematic and partly substantive generalizations.
Furthermore, following mainstream usage-based approaches, we assume that mental representations are stored in taxonomic networks (cf. Croft and Cruse 2004: 262-5;Goldberg 2006: 215): speakers first of all encounter specific, substantive instances of a construction (the more money we come across, the more problems we see; notorious B.I.G. -Mo Money Mo Problems), which are stored in an exemplar-based fashion. Only structures with a high type frequency, that is, those that have been encountered with many different lexicalizations (the more Bill earned, the more he spent on clothes/the more Jane laughed, the more he felt uncomfortable/the more they heard, the more they wanted to know,…), all of which share a common meaning, contribute to the entrenchment of a more abstract CC construction such as (2) (cf. Goldberg 2006: 39, 98-101; see also Bybee 1985Bybee , 1995Croft and Cruse 2004: 308-13). Following Hoffmann (2019: 17-18), we take statistically significant frequency effects unearthed by the analysis of corpus data as a proxy for the entrenchment of taxonomic networks (see also Bybee 2010: 10; Gries 2013: 97-101 as well as Stefanowitsch and Flach 2017).
In this article, we focus on the internal structure of CCs as well as the various entrenched constructional patterns. A main goal of the present study is to replicate previous studies to assess the validity of their results. Earlier usage-based studies have already speculated what parts of this network might look like, but these mostly relied on small data samples of around 40 tokens (Hoffmann 2014a(Hoffmann , 2014b(Hoffmann , 2018. Only two studies rely on a larger data sample of over 1,400 C1C2 tokens (Hoffmann 2019, presenting considerable statistical evidence for the existence of several partly substantive and partly schematic CC constructions (the so-called "meso-constructions"; see below). Yet, this is only a single dataset and, as is standard procedure in any science, requires replication to assess the validity of its results. Moreover, while Hoffmann et al. (2019) drew on data from the Corpus of Contemporary American English (COCA), the present study replicates their analysis with more than 2,000 tokens from the British National Corpus (BNC) to test whether any variety-specific factors are at work.
Next, we will take a closer look at several syntactic features of English CCs that are particularly relevant from a usage-based perspective (Section 2). Then we will discuss the data and methodology of the present study (Section 3), followed by the results of the corpus study (Section 4) and a usage-based construction grammar analysis (Section 5).

Syntactic features of English CCs
Due to their semantic and syntactic properties, extensive research has been conducted on CCs (see Fillmore 1987;Fillmore et al. 1988;McCawley 1988;Michaelis 1994;Culicover and Jackendoff 1999;Borsley 2004;den Dikken 2005;Sag 2010;Cappelle 2011;Kim 2011;Hoffmann 2019). In the following, we will focus on five features that lend themselves to a corpus-based analysis (for other features, see Hoffmann 2019: 44-53) and have important implications for the entrenched constructional network of speakers.
The first of these features concerns the order of C1 and C2. Apart from arrangements like (1), where C1 and C2 appear to iconically encode the construction's cause-effect relationship, a C2C1 arrangement, sometimes referred to as CC′ (cf. Culicover and Jackendoff 1999: 549;Hoffmann 2017b), is also possible, as illustrated by (3), a variation of (1): (3) [We'll be (the) happier] C2 [the more we get together.] C1 Whereas the "iconic" (i.e., motivated by the cause-effect semantics of the construction) (Hoffmann 2014a: 32, Hoffmann 2017b) C1C2 order formally features two clause-initial elements (i.e., the), the is not obligatory in C2 and the comparative phrase is placed at the end of the clause in C2C1 structures. Of course, this raises the question as to whether one of the two orders is preferred over the other. As has been pointed out by Hoffmann (2017b, 2018: 186), Hawkins' competence-performance hypothesis (2004 predicts that the C1C2 order should be preferred by speakers because it corresponds to the cause-effect semantics of the construction. In fact, Hoffmann (2018), drawing on the BROWN corpora family³ appears to confirm this, with his data revealing a ratio of 37:1 for the C1C2 over the C2C1 order (Hoffmann 2018: 193). Besides, his diachronic study of the competition between C1C2 and C2C1 (Hoffmann 2017b(Hoffmann , 2019 indicates that the preference for the former, more iconic, structure has existed since the early Middle English period. This effect, however, has not been investigated in any larger corpus. Our present study now allows us to test this claim using a considerably larger database of more than 2,000 CC tokens, more than 16 times as many as in Hoffmann's (2018) study.
While the competition between C1C2 and C2C1 structures is in itself an interesting phenomenon, there are a couple of properties that only affect C1C2 constructions. Next, we will discuss two syntactic features that have been presented in the literature as an indication of a hypotactic relationship between the C1 and C2 clauses in C1C2 constructions, with C2 being the main clause and C1 being a subordinate clause.⁴ While diachronically, English CCs were clearly hypotactic in nature (Hoffmann 2014a(Hoffmann : 81, 2017b), we will argue that synchronically the structure has become more paratactic in nature in the present-day English (a claim made in, e.g., Culicover and Jackendoff 1999).
The two hypotactic features that we will examine here are optional that-complementizers in C1 (4) and the possibility of optional subject-auxiliary inversion (SAI) in C2 clauses (5a). Note that SAI is not possible in the corresponding declarative clauses (5b).
(4) [The more [that] THAT-complementizer he says,] C1 [the less I wanna say.] C2 (5) a. [The more they work,] C1 [the more [I will/will I] SAI pay them.] C2 b. *Will I pay them more.
In the literature, we find differing opinions regarding the grammaticality of these features: while den Dikken states that that-complementizers are possible in both C1 and C2 (2005: 502), Culicover and Jackendoff note that they "cannot appear in C2" (1999: 549). Hoffmann claims that in earlier stages of English, there was an "optional that-complementizer" in the C1 clause, whereas "colloquial [modern English] apparently licenses an optional that in both C1 and C2" (2014a: 96). However, in his COCA data set of 1,409 C1C2 tokens, thatcomplementizers only appeared in less than 2% (= 24/1,409 tokens) of all C1s and less than 1% of all C2s ( 1,409 tokens;Hoffmann 2019: 125). For this reason, the present study investigates whether the current data set confirms the low frequency of this phenomenon.
With regard to SAI, Culicover and Jackendoff state that it may occur "marginally […] in C2 but not C1" (1999: 559) and Hoffmann claims that it is "optional" in C2 but "disfavoured" (2014a: 94). Similarly, den Dikken acknowledges the possibility of SAI in C2 but also states that it is "profoundly ungrammatical" in C1 (2003: 2). In his COCA study, Hoffmann did only find SAI in C2 and again his data confirm that it is disfavored in American English (with only 3% = 10/337 BE tokens exhibiting SAI; Hoffmann 2019: 127).
As we will show, while historical remnants (see Hoffmann 2014a: 30-9 for a diachronic overview of the development of that-complementizers and SAI since Old English) of these two hypotactic features may still be encountered in the present-day English C1C2s, they also appear with extremely low frequencies, leading us to claim that they are no longer central properties of the construction. In fact, there is substantial other evidence for the present-day English C1C2s being largely symmetric structures. A first hint is the identical clause-initial elements in C1 and C2, but much more importantly, as previous research has revealed (Hoffmann 2019) and the present study will further confirm, there is concrete empirical evidence for an iconic tendency of formal symmetry between C1 and C2 in C1C2 CCs. This leads us to the next two phenomena that the present study investigates: filler types and deletion/truncation phenomena in C1C2 CCs.⁵ There are various filler types that can be inserted into the comparative element slot that follows the clause-initial elements. Apart from adverb phrases (AdvPs) and adjective phrases (AdjPs), as exemplified, respectively, by more and happier in (1), the comparative element can also be an NP, as in (6), or a prepositional phrase (PP), as in (7) The filler type that occupies the comparative element slot is of interest because previous research has shown that speakers prefer certain, parallel cross-clausal associations with regard to filler types in C1 and C2 (Hoffmann 2019;Hoffmann et al. 2019). Statistical analyses revealed that despite the many possible fillertype combinations between C1 and C2, it is only symmetric filler types in C1 and C2 such as AdvP C1 -AdvP C2 , AdjP C1 -AdjP C2 , or NP C1 -NP C2 (Hoffmann et al. 2019: 14) that are significantly associated and can, therefore, be considered to have been entrenched as meso-constructions. Again, however, reliable statistical evidence for these patterns only stems from a single, large-scale corpus (Hoffmann 2019;Hoffmann et al. 2019) and requires further empirical corroboration.
Finally, moving on to the optional clause slot of the English C1C2 construction, we will examine deletion and truncation phenomena. While deletion is ungrammatical in normal declarative clauses in the present-day standard English (cf. *The price higher./*The product more interesting.), the examples (8a-d.) show that the verb BE in both C1 and C2⁷ can be optionally left out (see also McCawley 1988;Culicover and Jackendoff 1999;Borsley 2004):⁸ (8) a. The higher the price is, the more interesting the product is. b. The higher the price, the more interesting the product is. c. The higher the price is, the more interesting the product. d. The higher the In fact, as the following examples found in the BNC demonstrate, we can further distinguish subtypes of BE-deletion in English C1C2s. In addition to full clauses (9), i.e., with the clause slot filled but not with any form of BE, we encountered the retention of BE as a main verb (MV) (10) and an auxiliary verb (11). Similarly, the deleted BE can be a main verb (MV) (12) or an auxiliary (13): (BNC W_ac_polit_law_edu J6N) In addition to the deletion of BE, C1C2s may also be truncated, i.e., with only the obligatory comparative clause slot filled and no optional clause realized (14). Note that there are also well-known truncated CCs that appear to have become lexicalized (15): The more, the merrier.
Interestingly, confirming earlier results from Hoffmann (2018), Hoffmann et al.'s (2019) COCA corpus study identified significant cross-clausal associations with regard to deletion and truncation phenomena in English C1C2s. As was the case with filler types, these associations reveal a preference for symmetric deletion and truncation: the strongest attraction was determined for the pairs TRUNCATED C1 -TRUNCATED C2 , FULL CLAUSE C1 -FULL CLAUSE C2 , and BE-DELETED MV C1 -BE-DELETED MV C2 (Hoffmann et al. 2019: 21).
The present study thus tries to replicate Hoffmann et al. (2019) as well as Hoffmann (2019) with respect to the just mentioned types of parallel syntactic phenomena in C1 and C2 but also extends it by looking at the competition of C1C2 versus C2C1. We, therefore, aim to give a more detailed account of the constructional network of English CC constructions. In particular, we seek to examine the following features in detail and answer the corresponding questions: (1) C1 and C2 orders: as has been shown, there are two possible arrangements, C1C2 and C2C1. Do the frequencies in the corpus data confirm an iconic preference for the C1C2 over the C2C1 arrangement? Besides, the C1C2 data will be investigated for the following phenomena (which are not relevant for C2C1 CCs; see above): (2) SAI and that-complementizers: what do the data tell us about the frequency of these phenomena in C1 and C2? Is there evidence for a preference of paratactic over hypotactic features in ModE CCs? (3) Filler types: various types of syntactic phrases may appear as fillers, including AdvPs, AdjPs, and NPs. Are there cross-clausal associations in the data, as predicted by Hoffmann et al. (2019) and Hoffmann (2019)? (4) Deletion patterns: similar to filler types, can we determine cross-clausal associations regarding deletion and truncation phenomena?

Data and methodology
The methodology for the present study largely follows Hoffmann et al.'s study (2019: 9-13), which in turn was based on a number of previous usage-based construction grammar studies (Hoffmann 2014b(Hoffmann , 2018.
In contrast to earlier studies, the present article uses corpus data obtained from the BNC to determine the entrenchment of various meso-constructions. Now, corpus evidence is, of course, not "typically representative of the input [and] output of a particular individual" (Stefanowitsch and Flach 2017: 122). Yet, following Stefanowitsch and Flach's "corpus-as-output" and "corpus-as-input" hypotheses (2017: 101-3), we assume that corpus data at least afford one window into the mental representations of constructions from a representative sample of language (see also Hoffmann 2019: 17-18). Similar to Hoffmann et al. (2019), the data for the present study were extracted from an off-line version of the BNC, which consists of 100 million words and contains samples of both written (about 90%) and spoken (about 10%) language. The off-line version of the BNC does not differ from the online version concerning contents and was chosen because it allows considerably more precise and faster queries using regular expressions. In comparison to COCA, this is only about one fifth the size, but (with the exception of Hoffmann et al.'s 2019) still considerably larger than any previous corpus studies of CCs, which were merely based on 1 million word corpora such as the International Corpus of English (ICE) corpus family (Hoffmann 2014b cf. also;Hoffmann 2014aHoffmann , 2018. The BNC was queried with the following regular expressions (using the CLAWS 5 tag set) to retrieve all instances of CC constructions in the corpus: In total, this query yielded 4,256 tokens⁹ (3,665 tokens for the C1C2 pattern and 591 tokens for the C2C1 pattern), which were then coded by a team of five student assistants. The student assistants had received intensive training based on a sample data set and were provided with a detailed coding handbook that was composed by the researchers. In addition to this, they attended regular weekly meetings with an author of this study to discuss their progress and any issues they encountered. This author, in turn, checked the student assistants' work for possible erroneous annotation.
The first task of the student assistants was to discard tokens with false positives, portions of deleted text that were removed for copyright reasons,¹⁰ and the so-called "stacked constructions" where a third "C3" clause follows C1 and C2, as illustrated by (16):¹¹ (16) [the more serious the offence,] C1 [the more difficult to make peace,] C2 [the greater the compensation had to be.] C3 (BNC W_non_ac_soc_science ADW) After this task was carried out, a data set with 2,180 relevant C1C2 and C2C1 tokens remained (i.e., 2,076 tokens were discarded). Subsequently, the student assistants coded these tokens for the features that were discussed in the previous section. Table 1 gives an overview of the factors and levels that were coded.
The results of single variables such as ORDER were tested for statistical significance by a chi-square test. Cross-clausal associations, e.g., FILLER TYPE and DELETION, were assessed using a  9 Note that in contrast to previous studies (Hoffmann 2019, the present study is not just based on a data sample but is based on all tokens that the query yielded. 10 For copyright reasons, "[e]very 200 words, ten words are removed and are replaced with '@'" in the BNC (https://www. corpusdata.org/limitations.asp). As noted in Hoffmann et al. (2019), however, corpora which have been edited in this manner are "still an indispensable tool for mildly frequent structures such as the CC construction" (2019: 10). Also note that the removed portions should not cause any variation that would significantly affect the results of this study, as the omissions due to copyright do not affect "the 'relative frequency' of words, collocates and so on", since they occur randomly and therefore "without regard to context, affect all words equally the same" (https://www.corpusdata.org/limitations.asp). 11 These structures are, of course, interesting in terms of semantics, as C2 functions both as effect (in relation to C1) and as cause (in relation to C3). They will be analyzed in a separate future study due to the limited scope of this article.
covarying-collexeme analysis (cf. Stefanowitsch and Gries 2005: 9-11), following Hoffmann et al. (2019). This was done via the Coll.analysis 3.2a script for R (Stefanowitsch and Gries 2005: 9;Gries 2007). The Coll.analysis 3.2a script uses a Fisher-Yates exact test, which is very precise and handles even small frequencies very well (Gries 2015a: 313). The script provides information on the statistical significance of associations via a value called collostructional strength, which is a negative log-transformed p value (cf. Gries 2007). These have to be interpreted as follows: "values with absolute values exceeding 1.30103 are significant at the level of 5% (since 10 −1.30103 = 0.05)". Any value exceeding 2 corresponds to p < 0.01, and values above 3 indicate a significance level of p < 0.001. The reason for using negative log-transformed values is the better readability of results located "in the small range of 0.05 and 0", a range that corresponds to the "most interesting values" (Stefanowitsch and Gries 2005: 7).¹² Note that the covarying-collexeme analysis also provides information as to whether there is repulsion or attraction between two lexemes in a separate column of the output.
In addition to this, the Coll.analysis 3.2a script also outputs in separate columns two ΔP values, which provide information on the directional dependence of one slot on another. The following example by Gries (2015b) serves to illustrate this: of course are two strongly associated lexemes in English, but the preposition of co-occurs with many more lexemes than the noun course, which is preceded by significantly fewer words. Thus, course has a higher cue validity for of than the other way round. The ΔP value for course given of ΔP (course|of) is consequently going to be lower than the ΔP value for of given course ΔP (of|course). ΔP values range from −1 (strong repulsion) to +1 (strong attraction) and consequently allow for testing to what degree a slot in C1 depends on C2 ΔP(C1|C2) as well as the other way round ΔP(C2|C1).
Note, as an anonymous reviewer pointed out, that a multivariate analysis of the data that tests the effect of several variables at the same time would, obviously, be preferable to the individual analysis of variables presented below. Yet, as discussed above, none of the clause-internal variables in Table 1 applies to C2C1 CCs. Consequently, these variables can only be investigated in C1C2s (and it is impossible to, e.g., run a mixed effects logistic regression model with these as independent variables and C1C2 vs. C2C1 as the dependent variable). Moreover, even for C1C2s, these variables are not orthogonal but strongly correlated. SAI is only possible if BE is retained (and not deleted). That-complementizers are only relevant for full clauses (and irrelevant for truncated clauses). Yet, one interaction that is potentially of interest (and where the variables are not correlated in the ways described above) is the combination of FILLER TYPE × DELETION. We have addressed this issue by collapsing the factors FILLER TYPE C1 and DELETION C1 as well as FILLER TYPE C2 and DELETION C2 and running a covarying-collexeme analysis over these interaction data.  Table 2 provides an overview of the frequencies.
As Table 2 shows, these results confirm the strong preference for iconic C1C2 constructions in presentday English as discussed in Section 2, with a ratio of almost 15:1 (χ 2 = 1,659.5, df = 1, p < 0.001). Yet, while the C2C1 construction has been dispreferred ever since the Middle English period (Hoffmann 2017b(Hoffmann , 2019, it is interesting that it still remained a constructional option for speakers, albeit a rather infrequent one. Hoffmann (2018) speculated that this has to do with a pragmatic, focusing function that the C2C1 construction has, as is evident from the distribution of the focus particle even (cf. It becomes even FOCUS more interesting C2 , the more you think about it C1 . vs? The more you think about it C1 , the more interesting it even FOCUS becomes C2 .). However, this is a claim that requires further study.

Hypotactic phenomena: that-complementizers and SAI
As discussed in Section 2, English C1C2s have been claimed to exhibit syntactic characteristics that suggest a hypotactic relationship between C1 and C2, where C2 is the main clause and C1 the corresponding subordinate clause. Two phenomena that are often cited as evidence for this are optional that-complementizers in C1 (19) and optional SAI in C2 (20 and 21). Note that that-complementizers have been also claimed to be possible in C2 (22) In the following, we will take a closer look at what the BNC data reveal concerning these phenomena.

That-complementizers
Tables 3 and 4 provide an overview of the frequencies of that-complementizers in the data.
First, note that there are significantly more that-complementizers in C1 clauses (29 in total) than C2 clauses (only two; χ 2 = 1,926.6, df = 1, p < 0.001; for an example see (22)). Both instances of that-complementizers in C2 are from the spoken part of the corpus, which could be seen as (albeit limited) evidence that if thatcomplementizers appear in C2 at all, they do so in spoken English (Hoffmann 2014a: 96). However, even in C1 clauses, that-complementizers are used only marginally: of a total of 2,041 tokens, 29 instances amount to just over 1.42%. This dispreference of that-complementizers is again statistically significant (χ 2 = 2,033, df = 1, p < 0.001). Moreover, this probably also explains why in a covarying-collexeme analysis no pattern emerges as significant across the two clauses (with all four combinations, TRUE-TRUE, TRUE-FALSE, FALSE-TRUE, and FALSE-FALSE having a collostructional strength of 0.012; i.e., p > 0.05). Table 5 presents the frequency of SAI in C2 clauses in the BNC data.

SAI
As was discussed in Section 2, SAI is commonly cited as evidence for C2 being a main clause. The data do indeed reveal that there is not a single case of SAI in C1. However, of the 540 C2s that contained an auxiliary verb, there were only 50 instances of SAI, amounting to just 9.25%. Again, this effect is strongly significant (χ 2 = 358.52, df = 1, p < 0.001).

FILLER TYPE
Next, we present the results for the variable FILLER TYPE. Note that here, only data for C1C2 structures were analyzed. Figure 1 gives a first visual impression of the various filler types in C1 and C2; Tables 6 and 7 provide an overview of the frequencies of the various filler types in C1 and C2.   The mosaic plot in Figure 1 already suggests a clear tendency toward the mutual association of filler types across C1 and C2: if C1 has an AdjP as a filler, then C2 will highly probably also have an AdjP;¹³ the    13 It is notable that CCs in English clearly prefer AdjPs as filler types, as has been pointed out previously: "AdjPs are by far the most prototypical fillers that speakers encounter in CC constructions" (Hoffmann 2018: 188). same applies to the other filler types. The covarying-collexeme analysis, whose results are provided in  (2019), the three significantly associated combinations in the BNC data are symmetric: AdvP C1 -AdvP C2 (23), AdjP C1 -AdjP C2 (24), and NP C1 -NP C2 (25), with highly significant collostructional strengths. Our study thus offers corroborating evidence for the claim that these filler-type combinations form part of the English CC meso-constructional network as specific meso-constructions (cf. With regard to the ΔP values provided by the covarying-collexeme analysis, it is notable that the unidirectional associations in the three significantly associated patterns discussed above range from 0.099 (ΔP (NP1|NP2)) to 0.281 (ΔP (AdvP2|AdvP1)). This suggests a certain degree of entrenchment of these three symmetric patterns at the meso-constructional level but nonetheless creative variation of all possible patterns, including asymmetric ones. These values are very similar to those determined by Hoffmann et al. in their COCA sample (2019: 15).

DELETION
Further support for the existence of parallel meso-constructional CC templates is evident from the results of the covarying-collexeme analysis for the variable DELETION. Note that only data for C1C2 structures were analyzed. Again, let us first take a look at the raw frequencies, as provided in Tables 9 and 10, and the corresponding mosaic plot (Figure 2).
As was the case with FILLER TYPE, a first glance at the plot in Figure 2 already suggests symmetry across C1 and C2: if there is, e.g., a FULL CLAUSE in C1, it is very likely that a FULL CLAUSE will also appear in C2. We can confirm this intuition by taking a look at the results of the covarying-collexeme analysis presented in Table 11: Similar to the results determined for the variable FILLER TYPE, it is notable that only symmetric combinations exhibit statistically significant attraction, with the strongest one showing up for the Figure 2: BE-deletion across C1 and C2 in the BNC data. The unidirectional cue validities are notably higher than those determined for FILLER TYPEs, suggesting a stronger entrenchment. For example, the ΔP values for TRUNCATED C1 -TRUNCATED C2 are fairly high (0.575 for TRUNCATED1|TRUNCATED2 and 0.353 for TRUNCATED2|TRUNCATED1), which means that TRUNCATION in C1 strongly predicts TRUNCATION in C2 and vice versa. Similarly, high ΔP values could be determined for BE-DELETED MV C1 -BE-DELETED MV C2 and FULL CLAUSE C1 -FULL CLAUSE C2 , with the lowest score being 0.342 (FULL_CLAUSE1|FULL_CLAUSE2). The symmetric combinations BE-RETAINED MV C1 -BE-RETAINED MV C2 and BE-RETAINED AUX C1 -BE-RETAINED AUX C2 exhibit lower, yet still significant collostructional strength values (since values exceeding 1.30103  Table 9, leading to a Bonferroni-corrected p value of 0.05/25 = 0.002, which corresponds to a collostructional strength value of 2.70 (since log 10(0.002) = −2.69897). Consequently, the three symmetric structures that are positively associated in Table 9 remain significant even after this conservative correction for multiple testing. correspond to p < 0.05). Conversely, significant repulsion could only be determined for asymmetric pairs (see the lower part of Table 11).

FILLER TYPE × DELETION interaction
As the previous sections showed, both the variables FILLER TYPE and DELETION exhibit significant parallel associations across C1 and C2. While many other variables (such as SAI or that-complementizers) are not orthogonal to the other phenomena, FILLER TYPE as well as DELETION should in principle be able to vary independently of each other. At the same time, from a usage-based construction grammar perspective, it is very well possible that associations of these variables can also become entrenched. In order to test this, for each of the two clauses, the levels of the variables FILLER TYPE as well as DELETION were crossed and the resulting complex FILLER TYPE and DELETION factor was subjected to a covarying-collexeme analysis, the results of which can be found in Table 12 on the following page.
As can be seen in Table 12, the statistical analysis reveals 14 significant associations of FILLER TYPE and DELETION across C1 and C2. Similar to the individual results of the two variables, a great number of parallel structures emerge as significant. In fact, 6 of the 14 significant associations have perfectly identical factor combinations (with AdjP_BE_DELETED_MV in C1 and AdjP_BE_DELETED_MV in C2, AdjP_TRUNCATED in C1 and AdjP_TRUNCATED in C2 and AdvP_FULL_CLAUSE in C1 and AdvP_FULL_-CLAUSE in C2 being the three most strongly associated patterns with collostructional strength values >29, i.e., p ≪ 0.001). Of the remaining eight, six at least share one feature (either DELETION or FILLER TYPE across C1 and C2). As these results show, neither of these two variables is exclusively associated with a particular feature of the other variable, but the iconic parallel semantics seem to have supported the entrenchment of various parallel structures across C1 and C2.

Discussion
In the following, we are going to present an analysis of the empirical results that not only sheds more light on the English CC meso-constructional network but also answers important questions concerning the relationship between C1 and C2.
Concerning the order of C1 and C2, the absolute frequencies determined in Section 4.1 reveal a clear preference for the iconic C1C2 over C2C1, with a ratio of 15:1. Note that this is a lower ratio than that determined by a previous BROWN corpus family study by Hoffmann, where it was 37:1 for C1C2 over C2C1 (2019: 193). Nevertheless, we can still speak of a strong tendency toward the iconic C1C2 order. The low frequency of C2C1 structures can thus largely be explained by iconicity.
Next, we turn to the question of whether C1C2s are a hypotactic or paratactic structure in the presentday English. As mentioned in Section 2, previous studies differed in their opinion concerning the grammaticality of hypotactic features in the present-day English CCs. First, there are conflicting views about the possibility of that-complementizers in C2 clauses and, second, the status of SAI in C2 has not been conclusively decided, with only vague assertions such that the latter is "marginal" (Culicover and Jackendoff 1999: 559) or "disfavoured" (Hoffmann 2014a: 94). As the absolute frequencies from the BNC data show, we can now confirm that that-complementizers, despite their marginal occurrence of just  16 6 × 6 = 36 configurations were tested in Table 11, corresponding to a Bonferroni-corrected p value of 0.05/36 = 0.001388889, which translates into a collostructional strength value of 2.86 (since log10 (0.001388889) = −2.857332). The only symmetric structure that is positively associated in Table 11 that is no longer significant after this conservative correction for multiple testing is BE-RETAINED AUX C1 -BE-RETAINED AUX C2all other attraction effects remain significant. receives further support by the findings on the variables filler types and BE-deletion/truncation phenomena.
The results of the covarying-collexeme analysis revealed at least three statistically significantly associated filler-type combinations (AdvP C1 -AdvP C2 , AdjP C1 -AdjP C2 , and NP C1 -NP C2 ) and five statistically significantly associated deletion phenomena combinations (BE-DELETED MV C1 -BE-DELETED MV C2 , TRUNCATED C1 -TRUNCATED C2 , FULL CLAUSE C1 -FULL CLAUSE C2 , BE-RETAINED MV C1 -BE-RETAINED MV C2 , and BE-RETAINED AUX C1 -BE-RETAINED AUX C2 ). Moreover, several filler-type and deletion patterns are together significantly associated across C1 and C2. Based on our statistical analysis, these combinations can therefore be considered entrenched as meso-constructions in the English CC network. This, consequently, corroborates the findings of previous research that found exactly the same five mesoconstructions based on the data from a different corpus, the COCA (Hoffmann 2019. What is striking is that all of the statistically significant cross-clausal associations are symmetric, despite the many other combinations that are possible and were indeed encountered in the corpus data (attesting to the productivity of the CC construction). This is, therefore, the clear evidence that supports our claim that the central properties of Modern English CCs are paratactic, not hypotactic.
Finally, the productivity of CCs indicates that we still have to postulate a maximally abstract macroconstruction such as (2) to account for all the various observed variable combinations. At the same time, our usage-based approach supports Hoffmann et al.'s view (2019) that, in addition to this, the English taxonomic CC network also contains the above meso-constructions with strong parallel features.

Conclusion
The present large-scale corpus study has provided new insights into the various phenomena of English CCs. Our analysis confirmed the findings of previous studies but also uncovered new, hitherto unknown, facts about the English CC: • Concerning the order of C1 and C2, we have been able to show that the iconic C1C2 structure is strongly preferred over C2C1 ones, with a ratio of 15:1. Since focus particles appear to be only acceptable in C2 in C2C1 structures, we assume that C2C1s encode a pragmatic, focusing function. • Furthermore, the present study investigated that-complementizers and SAI. Both of these features can be found in the present-day English CCs, albeit with very low frequencies. This suggests that these two features are no longer central properties of the English CC construction, which appears to be significantly more paratactic in nature, as suggested by the symmetric cross-clausal associations that were determined using statistical analyses. In line with Goldberg's Principle of No Synonymy (1995: 67), we tentatively raised the hypothesis that both these features have a pragmatic function of expressing focus. Yet, these claims clearly require future empirical corroboration. • Finally, the covarying-collexeme analyses of filler types and deletion phenomena confirm the findings of Hoffmann et al.'s COCA corpus study (2019), i.e., cross-clausal C1C2 associations that are evidence for entrenched meso-constructions. These cross-clausal attraction phenomena could only be found for symmetric structures. Significant repulsion was only found for asymmetric structures.
The implications of the above results are twofold: first, they provide further evidence that CCs are rather paratactic than hypotactic in nature, thus encoding the symmetric semantics of CCs. The use of hypotactic features might be explained with pragmatic functions, but this is something that future studies will have to investigate in more detail.
Second, since they cannot be explained by many previous approaches that treat C1 and C2 as two independent structures that are licensed separately from each other, our data offer further support for Hoffmann et al.'s (2019: 32) assumption that meso-constructional templates play a significant role in the present-day CC construction network.