Variability as normal as apple pie

: In recent studies in second language (L2) development, notably within the focus of Complex Dynamic Systems Theory (CDST), non-systematic variation has been extensively studied as intra-individual variation, which we will refer to as variability . This paper argues that variability is functional and is needed for development. With examples of four longitudinal case studies we hope to show that variability over time provides valuable information about the process of development. Phases of increased variability in linguistic constructions are often a sign that the learner is trying out different constructions, and as such variability can be evidence for change, and change can be learning. Also, a limited degree of variability is inherent in automatic or controlled processes. Conversely, the absence of variability is likely to show that no learning is going on or the system is frozen.


Introduction
The title of this article is related to the presentation one of the authors, Kees de Bot, gave at the conference "Intra-speaker variation across time-Sociolinguistics meets psycholinguistics" held in Salzburg in February 2019, organized by the editors of the current special issue.In the paper he argued that variability is inherent in all processes and that actually stability-something being exactly the same as before-is something that hardly ever occurs, and he gave an extended example of baking apple pies.In other words, variability is as common or normal as apple pie.According to the editors of this special issue, in many areas of linguistics rather limited attention has been paid to this so-called non-systematic or free variation.However, in recent studies in second language (L2) development, notably within the focus of Complex Dynamic Systems Theory (CDST), nonsystematic variation has been extensively studied as intra-individual variation, which we will refer to as variability.The paper first explains how we define free variability and why it is functional.Then with examples of four longitudinal case studies we hope to show that variability over time can provide valuable information about the process of development.Phases of increased variability in linguistic constructions are often a sign that the learner is trying out different constructions, and as such variability can be evidence for change, and change can be learning.Also, a limited degree of variability is inherent in automatic or controlled processes.Conversely, the absence of variability is likely to show that no learning is going on or the system is frozen.

Variability as source of information
The starting point in a Complex Dynamic Systems Theory (CDST) approach to language development is that variability, even though seemingly random, provides a powerful source of information and is as common as apple pie and has been investigated successfully in Second Language (L2) developmental studies since the beginning of this century.Thelen and Smith (1994) were some of the first to point out that variability in human development is actually functional.In their view, development is an individual and rather erratic discovery process.The learner must discover, try out, and practice each part of the process him or herself, and this is accompanied with a great deal of trial and error, referred to as variability, which is functional as a principal component of the learning process.This implies that the degree of variability should be considered as information about the process rather than to be explained in terms of what has caused it.Thelen and Smith (1994) argue that variability is especially large during periods of rapid development as the learner explores and tries out new strategies or modes of behaviour, which may not always be successful and thus alternate with old strategies or modes of behaviour.Verspoor and van Dijk (2013) argue that the cause and effect relationship between variability and development should be considered reciprocal (see also Larsen-Freeman and Cameron 2008).This means that causes shape effects, a process which directly or indirectly reshapes the causes.In this mutual process both the cause and the effect are at the same time the antecedent and the consequence of the other.Deciphering these cycles is a challenging task, as in a CDST perspective there is no mono-causality.No single and unidirectional causality is likely to occur.The role of variability in this process is that it licences flexible and adaptive behaviour which is needed on an everyday basis in an ever-changing context and is needed even more so for development; especially in development, no new behaviour can be expected to occur when there is no variation.It is the free exploration of performance that generates variability.When a learner tries out new tasks the system becomes less stable, which leads to an increase in variability.Therefore, the claim is that stability and variability are indispensable aspects of human development.
The idea of variability is not new to the field of L2 learning and has been acknowledged and studied especially in the 1980s.Inspired by variationist research in sociolinguistics (Labov 1963), these studies focused predominantly on specific causes of this variability, such as task conditions.In a synthesis of these studies, Tarone (1988) concluded, however, that not all causes of variability can be explained; she called for more longitudinal studies to "show the way in which variation at a single point in time is tied to longitudinal development of an interlanguage" (Tarone 1988: 137).To the best of our knowledge, the first scholar who speculated about the function of variability over time was Ellis (1994), who referred to free variability, variability not caused by any known linguistic, situational, or psychological factor.Ellis referred to six Spanish learners of English as an L2 who were studied over a period of 10 months by Cancino et al. (1978).Each learner made use of a variety of forms to express negation at different stages of development.Ellis speculated that such free variability occurs only during an early stage of development and then disappears as learners develop a better-organized L2 system (Ellis 1994: 137).However, when L2 development is viewed as the result of the complex and dynamically changing interaction of all influences that are relevant in this process, variability is the inherent manifestation of this development and can provide information about the underlying process.Therefore, developmental variability is what CDST studies aim to explore by examining patterns of variability in individuals over time.
Before discussing our own findings related to variability and development in L2 development from a CDST perspective, we will first try to provide information on the occurrence of variability and the reasons for its occurrence with an apple pie example as metaphor.Let's say an inexperienced apple pie baker, called Kees, is tracked over the course of one year as he tries to improve his skills by baking one apple pie every week.Each week, Kees will have exactly the same ingredients in exactly the same amounts in exactly the same kitchen at his disposal.However, 'exactly the same' is inherently impossible.None of the ingredients can be exactly the same, as they have already been used previously, and the kitchen or oven may be slightly warmer or colder on different occasions.Also, when the ingredients are mixed, they may not be mixed in the exact same order or for exactly the same amount of time.Finally, rolling out the dough, putting it in the baking dish and so on may be slightly different each time.All these slightly different ingredients and processes result in naturally occurring variability in the quality of the apple pie.It inherently can never be exactly the same.In addition, our novice apple pie baker will probably do his best to improve his apple pie and experiment a bit, especially early on.Perhaps it is better to let the dough rest for 1 h in the fridge.Perhaps it is better to pre-heat the oven better.Perhaps a little less sugar and a bit more lemon will make the apples tarter.This endeavour in experimenting and trying to improve may cause more fluctuation (which we call free variability) in quality at the time.We would especially expect this to happen early on in Kees' apple-pie baking career as the apple-pie baking processes are not automated yet, but knowing Kees, there may also be strong fluctuations in quality later on, as once he is fluent in baking a good apple pie, he will probably start experimenting again to refine the final product.
From a CDST perspective we would call the apple pie ingredients, which are themselves the outcome of a long dynamic process, nested sub-systems, and the processes of mixing and baking, which are inherently dynamic in themselves, a complex dynamic process which self-organizes continually.And by charting the quality of each apple pie each week, we can infer something about the developmental process in learning to bake apple pies.We would expect strong fluctuations early on, and there may be a smaller bandwidth of fluctuations at later moments when the systems approach stability.Then again, stability may also generate an increase in variability when the baker starts to experiment with less common ingredients to raise the concept of apple pie to the next level.But to summarize, different degrees of variability may occur at different phases in development, but there is always variability.In other words, variability is as common as apple pie.
More seriously, looking back at about 20 years of research into L2 development within the framework of CDST, we can now identify what it has brought us, what the challenges are, and what the most important future directions will be.We have identified several key characteristics of Complex Dynamic Systems that have a long history in science, and more recently in cognition, but can now convincingly be applied to the use and development of language.Like other complex dynamic systems, language development is best explained as resulting from the dynamic interaction of many subsystems, which is referred to as selforganization.Language use and language development are also continuously changing, as we can simply not expect all relevant subsystems to be in a stable stage at the same time-there will always be at least one subsystem that changes, resulting in the subsequent instability of the system as a whole.It is from this perspective that we would like to discuss the nature of variability in language use and language development.Perhaps the most general, but also the most influential conclusion we can draw is that L2 development is best characterized as an iterative development at a multitude of time scales.This is very clearly phrased by Thelen (2005: 271): "Every act in every moment is the emergent product of context and history, and no component has causal priority." The emergent product of our speech production system is the utterance that the speaker produces.The utterance, then, is the emergent product of context and history, and follows from the interaction of subsystems, or self-organization, at that moment and at that place for that speaker.And since neither the history nor the context will ever return to the same unique constellation, a different utterance will emerge at a different moment in a different context.This, then, could be perceived as variability of the learner's utterances over time, or intra-learner variability.The question is whether we can rightly call variability functional and what evidence we have so far.Below we give four examples of longitudinal data to illustrate the function of variability in L2 development.Some data are of absolute beginners and some of more advanced learners and the data show findings from different linguistic dimensions, from phonological development to L2 writing.

Variability in beginner L2 pronunciation
The first example is that of a five-year-old American boy (B) who temporarily lived in the Netherlands with his American parents and learned Dutch in a naturalistic setting.This example focuses on the development of the phonological system of the learner.During B's stay in the Netherlands we closely followed his Dutch pronunciation for about a year (Lowie 2013) by weekly measurements of speech production in several speaking conditions as well as weekly measurements of speech perception.We collected data and analysed many different aspects of his pronunciation.For the current example we will concentrate on Dutch vowel production in a speech repetition task, and we will focus on the development of the Dutch closed front rounded vowel /y/.In English, all rounded vowels (/u/, /ʊ/, /o/, /ɔ/) are back vowels, and there are no rounded front vowels.In Dutch, most rounded vowels are also back vowels, but /y/ is one of the exceptions.For English learners of Dutch the production of /y/ provides a major challenge, as it requires a new combination of the articulators.English learners of Dutch who are not in command of the articulation of front rounded vowels have basically two options.They can either hold on to the roundedness of the vowel and ignore its front position (/u/), or they can stick to the closed, front position of the vowel and ignore its roundedness (towards /i/).This is also typically what has been observed for English learners of French /y/ (see Flege 1987).And although the production of /y/ can normally be acquired after some time and effort, the vowel production by native English speakers is normally characterized by relatively small difference between /y/ and /u/ (Crowther and Mann 1994; Flege 1987).See Figure 1 for an illustration of these options.
One interesting question is to what extent B reached the articulatory target /y/ in his production after six months.An acoustic analysis of the mean formant values of the vowels during each session shows a clear tendency towards native speaker values.We charted these data in Figure 2 as for the first three formants, in which we marked the approximate formant values of the vowel (target formants) as produced by a group of agematched Dutch native speakers.The vowel formants tend to be affected by the position the various parts of the vocal tract.The F1 value roughly represents the open to close dimension, with low values associated with closed vowels (like /i/ and /u/).The F2 value is especially affected by the back to front to back dimension, with low values associated with vowels that are produced in the back of the mouth (like /u/).Lip rounding will lower the value of the F3 (but also the F2).When we compare the first and the last measurement of B's Dutch vowel production, the overall development clearly tends towards the native speaker values towards the end (Figure 2).
Although this outcome is interesting, we were primarily interested in the process of development.How does the development evolve over time?What we observed in our longitudinal data collected over a period of 14 weeks was that there was a seemingly random variation between variants of /i/ and variants of /u/, with the occasional combination of the two as a diphthong /ui/.The development is far from linear and shows a highly variable developmental trajectory (see Figure 3).The first few weeks B varies between the rounded /u/ and nonrounded /i/, with the occasional diphthong /iu/, but later on, his productions tend towards the target /y/ much more often.He remains quite variable until the end, but the formant values do change.We argue that this type of variability is not intentional and not caused by any factors, but shows that the learner is aiming for a target sound and is constantly trying it out until he approaches native like productions.The variability is functional in that without trying and experimenting to aim for a target form, there would be no change.A comparison to his older sister's data (her analysis is not included here) illustrates this observation: her data show very little variability, and there is hardly any change; accordingly, her production is not moving towards native speaker targets.

Variability as predictor of proficiency gains in beginner L2 learners
The fact that variability is indeed functional and signals the learning process was also shown in our second data example from a study reported on in Lowie and Verspoor (2019).This study traces the changing quality of writing samples-scored holistically on general proficiency-that were produced by 22 highly similar 13-year-old Dutch learners of English, collected as weekly assignments over a period of 21 weeks.The assignments were similar, and all samples were blindly rated by three trained raters who showed strong agreement in their scores.The learners of English in this dataset attended a bilingual stream of a Dutch school and all had high scores for scholastic aptitude, motivation and several other relevant individual differences.Since at least half of their classes were in English, they had very extensive language contact with English.To assess the progress of the learners, we compared the average score of the first two samples to the average score of the last two samples (see Figure 4).Not surprisingly, the mean ratings of the samples collected towards the end of the data collection period were significantly higher than the mean rating of the early samples.This is what is clearly shown in the bar graph on the right in Figure 4.Even though there was some dispersion (more towards the end), the group effect was clearly significant (Wilcoxon signed rank = 20.50;p < 0.01; effect size = 0.84).
However, when we compare the group findings with the individual trajectories at this timescale, we see enormous amounts of variation.The data show that not a single learner develops in exactly the same manner, neither in holistic ratings (as illustrated in Figure 5) nor in any other analytic score such as average sentence  length or average word length.The data show that all learners show variability in their writing over time and that most (but not all) learners show improvement of their writing in the course of the 21 weeks of data collection.
Intuitively, we may be tempted to account for the variability in the quality of each learner's writing samples.At one point, a learner maybe more motivated than at another point, the learner may have had a positive or negative school experience, or the learner may have had a very positive or negative association with the writing topic set.These are indeed known contributors to the variability we see, but Thelen and Smith (1994) would argue that a system of a beginner (like the learners in these data), which has not stabilized yet, is more likely to fluctuate from day to day than a system that has stabilized to a degree.In other words, there will always be some variability, but a greater bandwidth of variability shows us that a learner is aiming high in performance at one point and less so at another point.However, in the aiming high is where actual learning is  taking place, and the degree of variability in a learner's repertoire could be a predictor of proficiency gains.And this is indeed what we found.Even though the original aim of the study was to see to what degree individual factors such as motivation, aptitude or out of school exposure could predict proficiency gains, not a single factor turned out to be a reliable predictor of proficiency gains of individual learners.
5 Peaks in variability in the lexical subsystem in an advanced L2 learner The first two data sets showed the development of a beginning learner and a group of relatively early learners.
With regard to variability, we have argued that we may see strong fluctuations even after the system has settled, because also an advanced learner may be aiming for a new strategy.This is clearly shown in Penris and Verspoor (2017) who traced one learner of English as a foreign language over the course of 13 years with a five year gap.After high school he is at an estimated low B2 level when he enters a teacher training college, and when he finishes his MA thesis at a university (an international, all-English MA program in Applied Linguistics), he is at an estimated C2 level.From all the texts he wrote, about 50 were selected because they were similar in genre.The detailed analyses show that his writing development is a long, complex, dynamic process, in which different sub-components of the language change in interaction with each other.During the teacher training programme, the language develops quite differently from the time he is in the university programme, which requires a more academic register.As the learner's language develops, longer noun phrases occur, and more academic words appear, as reflected in a longer average word length.The linguistic system also becomes more accurate as the process of acquisition continues, at one point quite abruptly.After examining a great number of various measures of complexity, accuracy and fluency, the authors concluded that not a single linguistic variable developed linearly, even though some levelled off during development.During the 13 years, very few variables showed a peak in development, except the use of complex-compound sentences and the use of unique lexical items (words the learner had not used before) (See Figure 6).A significant peak can be found by randomly reshuffling the data 5000 times, and counting how often a similar peak occurs in the data set when shuffled.If a peak occurs less than 250 times, it is considered significant (α at 0.05).From the analyses it was clear that a significant peak was found in unique lexical items at data points 35 and 39, suggesting that early on at university the learner was trying to broaden his lexical repertoire, which also resulted in a significant difference between the overall use of unique lexical items in the teacher training and university phase.

Variability as sign of controlled or automatic processes
The data shown so far contain between 14 and 50 occasions of weekly observations collected in longitudinal studies over a period of several months.However, as argued by de Bot et al. (2007), variability occurs at all timescales, and the pattern of variability is often nested.Therefore, we will finally elaborate on variability at a much shorter time scale.For this we looked at the pattern of variability of response times (RTs) in a word naming task with 520 observations in about 15 min.The RT values in this experiment varied between about 350 and 600 ms.Whereas word naming experiments with RT measurements are commonly used to compare mean response times of groups of people in different conditions, we focus on the intra-learner variability during the task.More detailed reports of these experiments can be found in Lowie et al. (2014) and Plat et al. (2018).The claim we make in these papers is that variability reveals the nature of the underlying system.Similar to the way in which learners try out different vowel productions in speaking, or different sentence constructions in writing, the pattern of variability at the shorter time scale signals is referred to as self-organization, or in this case 'self-organized criticality'.Like in any dynamic system, variability is the manifestation of the interaction of subsystems.Especially in these studies with numerous task repetitions at relatively short time scales measured under controlled experimental conditions, we can see that variability is not random, but follows a fractal pattern.
A fractal pattern is a repetition of a self-similar pattern over time, related to the difference in the magnitude of the changes: small changes occur very frequently, while bigger changes occur only occasionally.This is perhaps best illustrated by imagining the magnitude of avalanches that occur in a pile of rice when we keep adding grains of rice to the pile (see Bak et al. 1987).Small avalanches will occur very frequently, while massive slides in the pile will be infrequent.The pattern of variability of the size of the changes may indicate a dependency relation between the changes.What happens at the next point in time may depend on the situation of the system (in this case the pile) at the previous point in time.In science, functional dependency between the changes in a system is commonly evaluated by calculating the logarithmic scales of the power spectrum of the frequencies in a log-log plot.This calculation allows us to assess if variable changes occur as a power of each other.The steps in calculating the power spectrum in a log-log plot are illustrated in Figure 7.The resulting slope of the line in the log-log plot shows the relative dependency of the variability pattern.A fully random pattern will result in a horizontal line (spectral slope = 0), meaning that there is no dependency relation between the subsequent points in the time series.An over-regular pattern will result in a very steep downward slope (spectral slope = −2), showing a full and predictable dependency between the changes.But a characteristic of the self-organization of a dynamic system is that the dependency relation between the events in time will show a fractal pattern, which will show as the balance between the two (the spectral analysis of power spectrum will show a spectral slope = −1).This pattern of variability is seen in most naturally occurring dynamic physical and physiological phenomena, like the pile of rice (or any other avalanche effect; Bak et al. 1987), but also in human heart rate (Pagani et al. 1986).A healthy heart rate strikes a balance between regular and irregular, which is a manifestation of the flexibility of the system and its ability to rapidly respond to changing events.The fractal pattern with a spectral slope is commonly labelled as fractal coordination.
Fractal coordination is very often found in variability patterns in repeated measurements in human behaviour (Kello et al. 2007).And, very interestingly, it has been found that learning is associated with an increasing tendency towards more fractal coordination in physical tasks.For instance, Wijnants et al. (2009) asked participants to draw a straight line between two points with their non-preferred hand in five blocks of 1100 trials.Their analysis of the variability of movement times between the two points showed that not only the speed increased with practice, but also the fractal coordination of the system increased over time.Fractal coordination coincides with increased control of the task.
Interestingly, such a pattern of fractal coordination is also found in response to items in response-time experiments using language tasks like naming (Van Orden et al. 2003).If the analysis of variability can be used to estimate the amount of control, an interesting application is the comparison in language tasks in the speaker's L1 and L2.And this is what we tried to do in the naming task in the study mentioned above.Lowie et al. (2014) asked bilingual participants to name 520 words in two languages, before and after a period of extensive language contact with one of their languages.A spectral analysis of the response times showed that naming in L1 generally shows more fractal coordination than L2, indicating a higher degree of control.But this study also showed that strongly increased language contact with one of the languages during several weeks consistently leads to a higher degree of fractal coordination during naming tasks in that language and a lower fractal coordination in the language the speaker had no contact with.
Since the start of language-related studies into fractal coordination by Van Orden et al. (2003) several studies have been conducted using a variety of settings and conditions.This line of research is still young, but the studies that have been conducted so far show that the analysis of variability reveals very interesting insights about the underlying system shown by the dependency relations during cognitive tasks on the shorter time scale (up to 20 min) and offers promising possibilities for further research.

Conclusion
In group studies in which the mean scores between groups are the main focus, variability tends to be considered as 'noise' or 'error'.With four data examples of variability in longitudinal case studies we have been trying to show that variability in the use of language over time can be meaningful.Our data show that variability can provide valuable information about the process of development.We have seen that increased variability is usually a good sign as it shows the learner is trying out different linguistic constructions, from sentence structure to sound production.In other words, variability can be evidence for change, and change can be learning.Conversely, the absence of variability is likely to show that no learning is going on.When we consider the process of development, we must realize that variability may occur at all time scales, from the lifespan to milliseconds, and that development at the different time scales is nested.In most of the studies reported in this paper, we collected data on a weekly basis, which showed variability in the course of 14- 50 weeks.But it is likely that we may also have found variability when collecting data every day or even every hour, showing different dimensions of change of the language systems.In our final example we even showed that variability in real-time language processing can help to quantify the relative degree of coordination and control of the complex dynamic system.
The variability in our longitudinal analyses also shows that development is an iterative process.Each step in time depends on the previous steps in time within the context in which they occur.In other words, development, and in our case language development, is an emergent process.And since the sequence of steps is the result of dynamically interacting subsystems in changing contexts, the process of development is unique for every individual.And this is what our data also clearly show.Each individual trajectory, even for similar learners in similar contexts, may be different.The 22 learners in the second data example show clearly different patterns and if we look at the development of /y/ for 10 similar American learners of English, we can expect different developmental trajectories even though the general learning principles may be the same.And although variability may seem to be random and unstructured, it can be seen as the manifestation of the system's functional self-organization that serves a clear developmental goal (see also Singleton this special issue).After abundant variations, after numerous instances of trial and error, the system will lapse into a new state and has reached a stage in emerging development.
In the last decade, the field of Applied Linguistics has seen a strong move from one-shot group studies to intensive longitudinal case studies.This may still be a consequence of the social turn, but in our view, the more socially oriented researchers have now overwhelmingly moved to longitudinal, qualitative research.For instance, in the Conversational Analysis (CA) tradition, change over time has not attracted much research.Sociolinguistic studies, especially when focusing on language change, tend to take the time dimension on board, but the perspective on variability is drastically different.Language variation may be used as a source of information to account for language change, but the focus is normally on the explanation of the variation by a fixed number of factors.In that sense our conceptualization of variability is rather different.
While we value the study of variability and variation in language use, language change and language learning, we should also be aware that the observation of the presence of variability and variation depends on the observer's perspective.We can assume that for the language user each utterance may be the only possible utterance at that moment and in that context-or, returning to Thelen (2005), each utterance is the emergent product of context and history.From the language user's perspective, then, variability does not exist.A clear illustration of this perspective is given by Levine (2020): "I once saw a cartoon in which a frog sitting by the bank of a stream asks a fish swimming by, 'How's the water?', to which the fish replies, 'What is water?'"(Levine 2020: 20).We may simply not be aware of the process that we are part of.When Kees is again trying to bake the ultimate apple pie, each of his carefully selected ingredients is the best possible available to him at that time and at that moment, and each of his carefully considered operations and all equipment used is the best possible option at that time and in that place, including his optimal concentration in spite of an unpredictable, sometimes noisy and disturbing context.And yet, we know that each time the apple pie is a little different.We may also assume that his apple pie today is better than one of the first apple pies he ever made.An observer's perspective in a lifelong longitudinal baking study may witness moments of variability, fractal coordination, trial and error, and developmental stages that signal an emerging, self-organizing system.But the baker just bakes his best apple pie every single time.

Figure 1 :
Figure 1: Vowel diagram for English sounds, with the estimated Dutch rounded front vowel added in the top left corner (when pronounced in context most productions by Dutch speakers will be somewhat more central).

Figure 2 :
Figure 2: Representation of the first three vowel formants (blue = F1, red = F2, green = F3) at an early and a late measurement of the participant's production of the front rounded vowel /y/ in Dutch.The coloured bands indicate the formant frequencies for age-matched Dutch native speakers.

Figure 3 :
Figure 3: Longitudinal measurements of the participant's production of Dutch front rounded vowel / y/, represented by the first three formants (blue = F1, red = F2, green = F3).The coloured bands indicate the formant frequencies for age-matched Dutch native speakers.

Figure 4 :
Figure 4: Group analysis of the writing samples of 22 Dutcvh learners of English, as rated by trained listeners.The graph shows the mean ratings at the first two samples and the last two samples.

Figure 5 :
Figure 5: Development of 22 individual learners over time, based on the rating of their writing samples.The figure shows the continuous measurements over 21 weeks.

Figure 7 :
Figure 7: (A-C) The spectral analysis of the intact trial series of the naming task responses.(D-F) The same analysis of the randomized data.(A, D) Detrended and normalized reaction time trial.Figure adapted from Plat et al. (2018).