Linguistic complexity in second language acquisition

: Since the1990s linguistic complexity has become an important issue in second language acquisition (SLA) research and teaching: second language (L2) learners want to know how well they are progressing, while teachers and researchers are interested to find out which grade of complexity can be associated with a particular proficiency level. After a short sketch of the background to the construct of complexity, the paper presents an overview of how complexity is measured in SLA, how it is related to other constructs of language proficiency (in particular accuracy and fluency), and by which factors complexity may be affected: these concern both internal linguistic factors and external factors, like task-related features and type of instruction. The paper concludes with directions for future research, focusing on the need for non-redundant, valid and reliable measures, more developmental measures, a broader scope of complexity, combined cross-linguistic and longitudinal research, and more research in instructional practice


Introduction
In second language (L2) teaching, both L2 learners and teachers want to be regularly informed about the progress made during the second language acquisition (SLA) process. The same holds for SLA researchers, who try to determine the optimal conditions for an L2 learner to become a proficient language user. The question, then, is which developmental index is best at measuring L2 proficiencyi.e., the most valid, reliable and feasible.
Research has shown that language proficiency is not a unitary construct, but can rather be split into various components. Assessment of language proficiency used to be based on the traditional four-skills model (listening, speaking, reading and writing) and on sociolinguistic and cognitive models of L2 proficiency (e.g., Bachman 1990;Canale and Swain 1980), with an initial focus on accuracy. In the 1980s, a distinction was made between accurate and fluent language use. Complexity was added as a third component in the 1990s, following Skehan (1989), who proposed an L2 model which included complexity, accuracy and fluency (CAF) as the three principal dimensions of proficiency. Since then, the dimensions of the CAF triad have figured as major research variables in applied linguistic research in general, and especially in SLA research and teaching.
This paper focuses on complexity in SLA, paying particular attention to how complexity has been operationalized and approached in SLA research. 1 2 The construct of complexity As illustrated by the papers in this special issue, complexity can be approached from diverse linguistic (sub) disciplines, including theoretical linguistics, language evolution, comparative linguistics, language typology, computational linguistics, psycholinguistics, neurolinguistics, etc. (Housen et al. 2019a(Housen et al. , 2019bKortmann and Szmrecsanyi 2012;Newmeyer and Preston 2014;Von Prince and Kilarski 2021). These various perspectives make it apparent that there is no central theory of complexity or agreed-upon measures to evaluate complexity. This is also the case within the field of SLA. In SLA research, the term 'complexity' has often been used with different meanings across studies, which limits the comparability of these studies and may explain why inconsistent findings have been reported Kuiken 2009a, 2009b;Norris and Ortega 2009). These studies also illustrate that, similar to language proficiency, complexity is multilayered, multifaceted and multidimensional in nature.
In the SLA literature, the term 'complexity' is used in at least two different ways: as cognitive complexity and as linguistic complexity (Housen et al. 2012). Cognitive complexity (or difficulty) is a relative notion. It refers to the relative difficulty with which language elements are processed, as determined by the learners' individual backgrounds, for instance, their aptitude, memory capacity, motivation and level of L2 proficiency. Linguistic complexity, also known as absolute complexity, refers to the intrinsic formal or semantic-functional properties of L2 elements (e.g., forms, meanings, and form-meaning mappings) or to properties of (sub) systems of L2 elements, independent from the learner: saliency, input frequency, redundancy and L1-L2 similarity.
Within linguistic complexity, grammatical complexity is distinguished from lexical complexity. Grammatical complexity can be further dissected into syntactic complexity (at the sentential, clausal and phrasal levels) and morphological complexity (inflectional and derivational). Within lexical complexity, a distinction is often made between diversity (the number of different words), density (the proportion of lexical words) and sophistication (the number of less frequent words). Both cognitive complexity and linguistic complexity determine whether L2 learning is more or less cognitively challenging (Bulté and Housen 2012;Housen and Simoens 2016).
Given the various ways in which the term complexity is interpreted, it is not surprising that there is no single, generally accepted definition of complexity. According to Wolfe-Quintero et al. (1998: 69, 101) "grammatical and lexical complexity mean that a wide variety of both basic and sophisticated structures and words are available to the learner". Ellis and Barkhuizen (2005: 139) define complexity as the "use of more challenging and difficult language … complexity is the extent to which learners produce elaborated language". For Iwashita et al. (2008: 32), complexity "refers to characteristics of utterances at the level of clause relations, that is, the use of conjunctions and, in particular, the presence of subordination".
Despite differences in formulation, these definitions illustrate that complexity is generally interpreted as a quantitative notion with respect to the number and variety of parts or elements in an entity or system, as well as to the relationships and interactions between the constituent parts. These definitions also demonstrate that earlier L2 research has focused on syntactic and lexical forms of complexity, which is why many of the references in this chapter are restricted to these two types of complexity. According to Bulté and Housen (2012: 34), this has led to "a rather narrow, reductionist, perhaps even simplistic view on and approach to what constitues L2 complexity". However, as we will see in the next section, other components of complexity have also received attention more recently.

Measuring complexity in SLA
In addition to a lack of consensus and consistency across L2 studies in how complexity has been conceptualized and defined as a construct, there are also problems and inconsistencies in how empirical studies have operationalized and assessed complexity. A central problem concerning the operationalization of complexity is how it can be measured validly, reliably and efficiently. Over the years, a wealth of different measures has been proposed. These measures range from holistic and subjective ratings by lay or expert judges, who provide a single score to a speech or text sample based on the overall impression of the performance, to objective, quantitative measures (frequencies, ratios, indices) of L2 production (Ellis and Barkhuizen 2005;Wolfe-Quintero et al. 1998).
With respect to linguistic complexity, early L2 research was restricted to grammatical and lexical complexity (Bulté and Housen 2012). Examples of measures of overall syntactic complexity are: mean length of utterance, T-unit, C-unit or AS-unit. 2 A coordination index is used for coordination, like the number of coordinated clauses divided by the total number of clauses. Measures for assessing subordination are: number of subordinate clauses; number of clauses per T-unit, C-unit or AS-unit; number of subordinate clauses per clause, dependent clause or T-unit. Other, more specific measures include frequency of passive forms, infinitival phrases, conjoined clauses, imperatives, auxiliaries, comparatives, conditionals, etc. With respect to morphology, measures for inflectional morphology (e.g., number of tensed forms, modals or different verb forms) and for derivational morphology (e.g., frequency of affixation) are used. Measures for assessing lexical complexity include measures of lexical diversity (e.g., TTR, Guiraud's index or D-value 3 ), measures of lexical density (e.g., number of lexical words per total words or per function words), and lexical sophistication (e.g., number of less frequent words per total words).
Criticisms were soon raised against this reductionist approach of complexity, and gaps and imbalances to complexity measurement in L2 research were identified (Bulté and Housen 2012;Norris and Ortega 2009;Pallotti 2009Pallotti , 2015. Firstly, because some of these global, overall measures were judged to be too coarse, more fine-grained measures have been proposed that address other syntactic levels (at phrasal level, for instance, the number of verb and noun phrases or the number of premodifying and postmodifying noun phrases), different types of subordination (causal, temporal, hypothetical, etc.), and measures that distinguish nominal subordination from subordination via subject/object relative clauses (Larsen-Freeman 2009;Pallotti 2009;Robinson et al. 2009). For an overview of these types of measures and studies in which they have been used, see, for example, Bulté and Housen (2012).
Secondly, it is questionable that the entire L2 developmental process can be captured in terms of overall length measures and subordination ratios, so other measures, which may reveal development at different levels of proficiency, should be applied. As stated by Ortega (2003), beginner and intermediate L2 learners may prefer complexity by coordination and subordination, while phrasal complexity may be favoured at more advanced levels of L2 proficiency. At university level, Biber et al. (2016) observed that as academic level increased, so did the use of phrasal complexity features in writing. On the other hand, the use of clausal complexity features in student writing, particularly finite dependent clauses, decreased as academic level increased.
Thirdly, by relying only on measures targeting grammatical and lexical complexity, other domains that could provide valuable information have been discarded. Recently, however, attempts to assess complexity in linguistic domains other than syntax and lexis have been proposed; these include propositional complexity, phraseological complexity and morphological complexity.
Propositional complexity refers to the amount of information, expressed as the number of idea units, which a speaker or writer encodes to convey the intended message. Vasylets and Manchon (2019) explored if and how propositional complexity was moderated by the modality in which a task was performed. The participants were 290 Spanish/Catalan university students of English as a second language. Propositional complexity was operationalized in terms of length of idea units, number of idea units and ratio of extended idea units. It turned out that the ideas expressed in speech were longer than in written text, whereas writing contained more extended ideas than speaking. No significant difference was found for the number of idea units.
Paquot (2019) focused on the phraseological dimension in interlanguage complexity research. She investigated to what extent measures of phraseological complexity can be used to describe L2 performance at different proficiency levels and compared measures of phraseological complexity with traditonal measures of syntactic and lexical complexity. The study revealed that, unlike traditional measures of syntactic and lexical complexity, measures of phraseological sophistication can be used to describe L2 performance at higher proficiency levels (level B2-C2 of the Common European Framework of References for Languages [CEFR]). This suggests that essential aspects of language development from upper-intermediate to very advanced proficiency are situated in the phraseological dimension.
Finally, Brezina and Pallotti (2019) introduced the Morphological Complexity Index (MCI), which measures the average inflectional diversity for the occurrences of a given word class in a text. They have tested the measure in two case studies, based on argumentative written texts produced by native and non-native speakers of Italian and English. De Clercq and Housen (2019) showed that the use of the MCI is promising, especially compared with existing approaches to calculating morphological complexity, although the measure tends to level off at higher proficiency levels.
With the advance of automated complexity and natural language processing tools, the repertoire of complexity measures has substantially grown, e.g., Coh-Metrix (Graesser et al. 2004 Kyle et al. 2018). These automated tools include a wide variety of both overall and more fine-grained measures, targeting more specific and linguistically sophisticated or developmentally advanced features (e.g., ratios of rare words, relative clauses or noun premodifiers). They enable assessing the complexity of a text in a relatively short time by means of sometimes more than 100 measures. What remains to be seen is the extent to which these measures are redundant, how valid and reliable they are, and in how far they can function as an index of L2 development.

Interaction of complexity with other components
As complexity is not the only factor which can account for processes in SLA, it is useful to study the interaction of complexity with other components, in particular accuracy and fluency. Ellis (1994) speculated that an increase in fluency could occur at the cost of development of accuracy and complexity, due to both the differential development of knowledge analysis and knowledge automatization in L2 acquisition and the ways in which different forms of implicit and explicit knowledge influence L2 development. Researchers who subscribe to the view that the human attention mechanism and processing capacity are limited (e.g. Skehan 1998) argue that L2 learners must prioritize where they allocate their attention during performance, so that attention allocated to one dimension of language production will be lost on others. According to this assumption, which is embodied in Skehan (2009) Trade-off Hypothesis, fluency may compete for attentional resources with accuracy, while accuracy in turn competes with complexity. Robinson (2003) proposes a different view with his Cognition Hypothesis, which claims that learners can simultaneously access multiple and non-competitional attentional pools. As a result, manipulating task complexity by increasing the cognitive demands of a task can lead to simultaneous improvement of complexity and accuracy. As denoted by the term Cognition Hypothesis, cognitive complexity comes into play here.
Testing out these two rival models has produced mixed results. Whereas Skehan (2009) observed trade-off effects between syntactic complexity and accuracy, other studies have demonstrated that syntactic complexity develops simultaneously with other CAF dimensions as learners' overall proficiency grows (e.g., Robinson 2011;Spoelman and Verspoor 2010). What research has pointed out is that complexity, accuracy and fluency do not develop collinearly in SLA; instead, they interact in intricate ways and this interaction is sometimes mutually supportive and sometimes competitive (Larsen-Freeman 2006;Spoelman and Verspoor 2010). This has brought Larsen-Freeman (2009) to assume that studying the CAF components individually does not bring us much further to finding out what effect each of these components has on learner performance in a linear causal way. Such a reductionist approach does little to advance our understanding, as we risk ignoring their mutual interaction. Instead, we should try to capture the development of multiple subsystems over time and in relation to each other.

Factors affecting complexity
The degree of complexity expressed in L2 users' language production may be influenced by both internal linguistic factors and external factors. Internal linguistic factors include linguistic features such as items, patterns, constructions, rules, L1 background, cross-linguistic variation and differences between native and non-native speakers. External factors refer to learner variables, such as personality (e.g., extraversion, anxiety), socio-psychological features (e.g., motivation, language aptitude), task-related characteristics (e.g., task type and genre, type and amount of planning) and features of pedagogic intervention (e.g., type of instruction). As a result, linguistic complexity and cognitive complexity may interact with each other. Due to space limitations, we cannot go into all these factors. For internal factors, we will elaborate on cross-linguistic influences, while for external factors we will focus on task variables and type of instruction.

Internal factors: cross-linguistic influences
Recently, some studies have appeared which underscore the relevance of cross-linguistic influences on complexity in L2 production. Through an investigation of learner texts from the International Corpus of Learner English (ICLE), Ehret and Szmrecsanyi (2019) showed how L1 background may contribute to L2 complexity. They found that L1 German essays from L2 English learners tended to be more complex in terms of overall complexity and morphological complexity than essays from L1 French, Italian and Spanish. A similar effect of L1 background was found by Van der Slik et al. (2019). In a study on the acquisition of Dutch L2 by almost 9,000 adult learners from 33 different language backgrounds, they demonstrated that the less morphologically complex their L1 was, the more difficulty they had in acquiring Dutch.
Another type of cross-linguistic variation was observed by . They found variation in the gradual syntactic complexification across proficiency levels and languages: advanced Italian L2 learners (L1: Dutch) used more coordinate structures within T-units, more relative clauses and longer postmodifying noun groups, whereas this was not the case for Dutch L2 learners (L1: various languages) and Spanish L2 learners (L1: Dutch). At the same time, variation between L2 and L1 learners was also observed: native speakers of Italian used longer post-modifying phrases than Italian L2 learners, while native speakers of Spanish used more relative clauses than Spanish L2 learners. In this type of research, data from native speakers of the target language are of crucial importance as they constitute a baseline from which L2 learners can be compared. Unfortunately, such data are not always present in L2 complexity research.

External factors: task variables
The type of external factors that have attracted most attention in recent years are language task variables. Using a number of selected studies, we will demonstrate how complexity is affected (or not) by some of these task related features.

Planning time and task complexity
Ellis (2009) investigated the role of three types of planning on CAF: rehearsal (performing the complete task once before performing it a second time), strategic planning (planning what content to express and what Linguistic complexity in second language acquisition language to use before performing the task, but without opportunity to rehearse the complete task), and within-task planning (any type of planning while performing the task). He concluded that all three types of planning had a beneficial effect on fluency but found mixed results for complexity and accuracy. Rehearsal provided an opportunity for learners to attend to conceptualization, formulation and articulation, therefore benefiting all three dimensions of the CAF triad in L2 production. Strategic planning assisted conceptualization in particular, and thus contributed to enhanced fluency and greater message complexity. Within-task planning benefited complexity and accuracy without having a detrimental effect on fluency. Robinson et al. (2009) studied the effect of increasing the complexity of task demands in two conceptual domains (time and motion) in L2 speech production, using specific measures of complexity and accuracy. Results showed more developmentally advanced use of tense-aspect morphology on conceptually demanding tasks compared to less demanding tasks, and a trend to more target-like use of lexicalization patterns for referring to motion on complex tasks. In line with the Cognition Hypothesis, the authors concluded that pedagogic tasks should be sequenced for learners in an order of increasing cognitive complexity.
Contrary to Robinson's predictions, Kuiken and Vedder (2012b) demonstrated that although manipulation of certain task characteristics might stimulate the production of particular linguistic features, no generalized effect of task complexity on linguistic complexity could be detected. The main influence of task complexity was to be found on accuracy, whereas no influence on syntactic complexity and lexical variation could be established. De Jong et al. (2012) investigated how task complexity affected native and non-native speakers' speaking performance. With respect to lexical complexity, both native and non-native speakers produced a wider range of words in complex tasks compared to simple tasks.

Task type, genre and modality
Different task types, such as instructional, descriptive, argumentative and problem-solving tasks, may lead to variation in syntactic complexity, resulting in an increase (or decrease) in syntactic complexity. In a longitudinal study on interlanguage variation in L2, Ferrari (2012) compared four L2 learners of Italian (adolescents from different linguistic backgrounds) with two native speakers of Italian, who did four oral tasks four times at yearly intervals. An effect of task type could be established, as subordination ratios tended to decrease with more interactive tasks, particularly at higher proficiency levels in L2 and in L1.
Differences can also be observed when syntactic complexity is assessed in different genres, like newspapers, narratives, argumentative essays, monologues or interactive speech. Yoon and Polio (2017), for instance, found higher scores for syntactic complexity in argumentative essays compared to narratives.
With respect to task modality,  and  found that speaking and writing may differ in syntactic complexity in various ways. Clausal subordination appeared to be rather common in daily conversation, in contrast to the frequent use of complex noun phrase constituents and complex phrases in academic writing. Vedder (2011, 2012a) observed higher syntactic complexity in writing than in speaking. These findings were corroborated by Vasylets and Manchon (2019), who found higher scores on syntactic and lexical complexity in written texts. As mentioned in Section 3, differences were also observed in the way speakers and writers conveyed the propositional content of the task. The findings of the study were interpreted as evidence of the facilitating conditions for restructuring during written production in instructed settings and, accordingly, of the language learning potential of L2 writing tasks.

External factors: type of instruction
For second language teaching, it is interesting to know if instruction can affect complexity in L2 learners, and, if so, what type of instruction is most effective. Studies with that particular aim may contribute to bridge the gap between SLA research and SLA classroom practice. Bulté and Housen (2019) analysed the effect of a Dutch-English Content and Language Integrated Learning (CLIL) program versus a mainstream program with English as a Foreign Language (EFL) teaching on the development of different aspects of L2 learners' lexical and grammatical complexity. It turned out that both groups of learners significantly increased the complexity of their L2 writing over the course of the study, although there was a high degree of intra-and inter-learner variability. Only limited effects of program type (CLIL versus non-CLIL) were found, suggesting that increased and more varied instructional exposure to the L2 in the CLIL program did not lead to significantly different L2 productions in terms of linguistic complexity.
In a study conducted among secondary school students in the Netherlands, Rousse-Malpat et al. (2019) explored the effects of explicit and implicit instruction in L2 French on linguistic complexity measures. The explicit treatment included a traditional focus on explicit grammar. The implicit group was taught using the Accelerated Integrated Method, a highly communicative, meaning-focused method without explicit instruction, but with a great deal of exposure and repetition to induce frequency effects. Results after three years showed that implicit instruction led to better writing complexity at various morpho-syntactic levels, but also to increases in text length and the use of short formulaic routines. No differences were found for lexical complexity.
Kuiken and Vedder (2019a) examined how teachers of two different target languages (Dutch and Italian) perceived syntactic complexity in L2 writing, if and how their perceptions differed in these two languages, and how teachers' judgments were related to the development of syntactic complexity as hypothesized in the SLA literature. The results revealed that teachers tended to focus primarily on accuracy and comprehensibility. When they did focus on syntactic complexity, there were both similarities and differences between the teachers of Dutch and Italian, possibly related to the target language, while teachers' reflections appeared to be only partly related to the hypothesized development of syntactic complexity in the SLA literature.

Future directions
In this paper, we have explored complexity from the perspective of SLA. Although this overview demonstrates how well the concept of complexityespecially syntactic and lexical complexityis anchored in both SLA research and teaching, there remain domains which need further investigation and refinement. These areas include the need for (1) non-redundant, valid and reliable measures; (2) developmental measures; (3) a broader scope of complexity; (4) combined cross-linguistic and longitudinal research; and (5) research in instructional practice. In what follows, we will briefly elaborate on these five areas.

The need for non-redundant, valid and reliable measures
In SLA research, a large variety of complexity measures have been used, both general and more specific. As pointed out by Norris and Ortega (2009), many of these measures are redundant, i.e., measuring exactly the same thing. 4 It is now time to further refine testing instruments and find the right balance between an acceptable number of valid measures (which measure what they are supposed to measure) and reliable measures (which yield the same result when the research is repeated under the same conditions) that can explain variation in complexity. It is promising that automated measures of complexity have been proposed, as well as new measures that take into account earlier neglected forms of complexity. This can be promoted by interdisciplinary research in which complexity is investigated from various perspectives. For example, Van der Slik et al. (2019) use information from the World Atlas of Language Structures (WALS; Dryer and Haspelmath 2013) to evaluate from a typological perspective how the presence or absence of (morphologically) complex features in languages influences their cognitive learning complexity. Further typologically inspired work has been undertaken by Ehret and Szmrecsanyi (2019), who adopt an information-theoretic approach to L2 complexity by employing the compression technique and text-deformation method of measuring linguistic complexity based on the formalism of Kolmogorov complexity.

The need for developmental measures
Next to the need for non-redundant, valid and reliable measures, there is a demand for more developmental measures which can describe performance at all levels of proficiency. Pallotti (2009) has pointed out that at advanced levels of L2 proficiency, more does not always mean better. Some measures may level off, e.g., mean length of utterance and morphological complexity, as measured by the Morphological Complexity Index (Brezina and Pallotti 2019). At these later stages (CEFR levels B2-C2), other measures may be more suited to describe L2 performance, e.g., phraseological sophistication (Paquot 2019).

The need for a broad scope of language complexity
As mentioned before, complexity should be studied in combination with other components of language proficiency, trying to capture the development of multiple subsystems over time and in relation to each other. There is a need for studies on complexity which extend traditional conceptualizations of complexity as syntactic or lexical complexity by focusing on other forms and manifestations of complexity, in particular instances of complexity that arise at the interface between syntax and lexis, i.e., phraseological complexity , phonological complexity, and discourseinteractional complexity. It is also encouraging to notice a shift from the more traditional focus on small convenience learner corpora representing individual learner development to the measurement of complexity in larger learner corpora. At the same time, an interesting paradox has been signalled, namely that most studies have assessed CAF within the contexts of communicative tasks, but very few discuss how the communication unfolded and whether it was successful in achieving its goals. Therefore, they argue thatalongside CAFfunctional adequacy (in terms of successful fulfilment of a particular task) should be included as a separate dimension of L2 production and proficiency (Kuiken and Vedder 2017, 2022Pallotti 2015).

The need for combined longitudinal and cross-linguistic research
The majority of studies on complexity in SLA have used a cross-sectional design, which makes it difficult to detect developmental patterns of complexity. Therefore, cross-sectional studies must be complemented by, or better still, combined with longitudinal studies in order to identify generalizable patterns of how linguistic complexity develops. The increase in syntactic complexity at the group level seems to be fairly linear, but at the level of the individual learners, there is a high degree of variability: the individual learners follow different developmental paths that often do not coincide with the observed mean group trends (De Clercq and Housen 2019;Lahmann et al. 2019;Spoelman and Verspoor 2010). It is therefore recommended to combine group studies with longitudinal case studies in order to identify generalizable patterns. As languages may not be equally complex (cf. Dahl 2011;Ehret and Szmrecsanyi 2016), others have shown the relevance of adopting a cross-linguistic perspective by examining the impact of complexity configurations of the L1 and L2 and differences between native and non-native speakers (Brezina and Pallotti 2019;De Clercq and Housen 2019;Van der Slik et al. 2019). Another important issue to consider is to extend the range of the learned languages. The number of the target languages has grown over the years, but there is still a bias towards English and other Indo-European languages, mostly Germanic and Romance languages.

The need for research in instructional practice
Finally, there is a need for research which investigates more thoroughly how teachers deal with complexity in instructional practice, as knowledge about complexity may help language teachers to improve their lessons and to instruct them when and how to stimulate complexity in language learners. Although complexity does not seem to be the primary focus of teachers when assessing L2 learners' proficiency , it is crucial for teachers to understand that complexity development should also be an important pedagogical goal. This has been proposed by Norris and Ortega (2009), who hypothesize a developmental order running from coordination via subordination to phrasal complexity. Students in higher education should also be taught that (written) scientific texts differ from other text genres, e.g., with respect to the use of passive constructions and a higher use of nominalizations, often combined with pre-and postmodifying phrases (Biber 2006;Hyland 2009). The adequate use of such forms is a process of trial and error. Syntactic and lexical errors are part of the process of L2 acquisition. More complex tasks lead to linguistically more complex language, and "errors" are thus a necessary prerequisite for L2 development: you've got to crack a few eggs to make an omelette (De Graaff 2019).