Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton March 9, 2022

Syntactic complexity in Finnish-background EFL learners’ writing at CEFR levels A1–B2

  • Ghulam Abbas Khushik ORCID logo EMAIL logo and Ari Huhta

Abstract

The increasing importance of the Common European Framework of Reference (CEFR) has led to research on the linguistic characteristics of its levels, as this would help the application of the CEFR in the design of teaching materials, courses, and assessments. This study investigated whether CEFR levels can be distinguished with reference to syntactic complexity (SC). 14- and 17-year-old Finnish learners of English (N=397) wrote three writing tasks which were rated against the CEFR levels. The ratings were analysed with multi-facet Rasch analysis and the texts were analysed with automated tools. Findings suggest that the clearest separators at lower CEFR levels (A1–A2) were the mean sentence and T-unit length, variation in sentence length, infinitive density, clauses per sentence or T-unit, and verb phrases per T-unit. For higher levels (B1–B2) they were modifiers per noun phrase, mean clause length, complex nominals per clause, and left embeddedness. The results support previous findings that the length of and variation in the longer production units (sentences, T-units) are the SC indices that most clearly separate the lower CEFR levels, whereas the higher levels are best distinguished in terms of complexity at the clausal and phrasal levels.

Abstrakti

Eurooppalaisen viitekehyksen (EVK) merkitys kielikoulutukselle on lisännyt tutkimusta sen taitotasojen kielellisistä piirteistä; tarkempi tieto näistä piirteistä auttaisi EVK:n soveltamista opetusmateriaalien, kurssien ja arviointin laatimiseen. Tutkimuksessa selvitettiin eroavatko EVK:n tasot toisistaan syntaksin kompleksisuuden perusteella. Suomalaiset 14- ja 17-vuotiaat englannin oppijat (N=379) kirjoittivat kolme kirjoitelmaa, jotka arvioitiin EVK:n taitotasoille. Arviointiaineisto tutkittiin monitahoisella Rasch-analyysillä ja tekstien piirteet selvitettiin automaattisilla analyysiohjelmilla. Tuloksien perusteella alimpia EVK-tasoja (A1–A2) erotti selvimmin toisistaan lauseiden ja T-yksiköiden pituus, vaihtelu lauseiden pituudessa, infinitiivirakenteiden määrä, lausekkeiden ja T-yksiköiden määrä lauseissa ja verbirakenteiden määrä T-yksiköissä. Ylempiä tasoja (B1–B2) erottelivat puolestaan määritteiden määrä nominifraaseissa, lausekkeiden pituus, kompleksisten rakenteiden määrä lausekkeissa ja pääverbiä edeltävien sanojen määrä (left embeddedness). Tulokset ovat linjassa aiempien syntaksin kompleksisuuden tutkimusten kanssa siinä, että pidempien tuotosyksikköjen (lauseet, T-yksiköt) pituus ja vaihtelu erottelee selvimmin englannin oppijoita alemmilla EVK-tasoilla, kun taas korkeammilla taitotasoilla erot ilmenevät lausekkeiden ja fraasien käytössä.

Sammandrag

Den ökande vikten av den allmäneuropeiska referensramen (CEFR) har lett till forskning i lingvistiska egenskaper hos CEFR-nivåerna eftersom den kan främja tillämpandet av CEFR i planeringen av undervisningsmaterial, kurser och bedömning. I denna studie undersöktes det om det finns skillnader i syntaktisk komplexitet (SK) mellan CEFR-nivåerna. 14- och 17-åriga finskspråkiga studerande av engelska (N=397) skrev tre skrivuppgifter som bedömdes enligt CEFR-nivåerna. Bedömningarna analyserades med mångfasetterad Rasch-analys och texterna analyserades med automatiserade verktyg. Fynden tyder på att de tydligaste särskiljande faktorerna på de lägre CEFR-nivåerna (A1–A2) var den genomsnittliga längden på meningar och T-enheter, variationen i meningslängden, tätheten av infinitiver, antalet satser per mening eller T-enhet och antalet verbfraser per T-enhet. På de högre nivåerna (B1–B2) var faktorerna antal bestämningar per nominalfras, genomsnittlig satslängd, antal komplexa nominala per sats och antal ord före huvudverb (left embeddedness). Resultaten stöder tidigare fynden om att längden på och variationen i längre produktionsenheter (meningar, T-enheter) är de SK tecken som tydligaste gör skillnader mellan de lägre CEFR-nivåerna, medan de högre nivåerna skiljer sig mest från varandra i komplexitet på sats- och frasnivåerna.

References

Ai, Haiyang & Xiaofei Lu. 2013. A corpus-based comparison of syntactic complexity in NNS and NS university students writing. In Ana Díaz-Negrillo, Nicolas Ballier & Paul Thompson (eds.), Automatic treatment and analysis of learner corpus data, 249–264. Amsterdam: John Benjamins.10.1075/scl.59.15aiSearch in Google Scholar

Alderson, J. Charles. 2007. The CEFR and the need for more research. The Modern Language Journal 914. 659–663.10.1111/j.1540-4781.2007.00627_4.xSearch in Google Scholar

Alexopoulou, Theodora, Marije Michel, Akira Murakami & Detmar Meurers. 2017. Task effects on linguistic complexity and accuracy: A large‐scale learner corpus analysis employing Natural Language Processing techniques, Language Learning 67. 180–208.10.1111/lang.12232Search in Google Scholar

Aryadoust, Vahid, Li Ying Ng & Hiroki Sayama. 2020. A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research. Language Testing, 38. 6–40.Search in Google Scholar

Barrot, Jessie & Joan Agdeppa. 2021. Complexity, accuracy, and fluency as indices of college-level L2 writers’ proficiency. Assessing Writing 47. 100510.Search in Google Scholar

Bartning, Inge, Maisa Martin & Ineke Vedder. 2010. Communicative Proficiency and Linguistic Development: Intersections between SLA and Language Testing Research. Eurosla.Search in Google Scholar

Biber, Douglas. 1992. On the complexity of discourse complexity: A multidimensional analysis. Discourse Processes, 15. 133–163.Search in Google Scholar

Biber, Douglas, Bethany Gray, Shelley Staples & Jesse Egbert. 2020. Investigating grammatical complexity in L2 English writing research: Linguistic description versus predictive measurement. Journal of English for Academic Purposes 46. 10086910.1016/j.jeap.2020.100869Search in Google Scholar

Bulté, Bram & Alex Housen. 2012. Defining and operationalising L2 complexity. In Alex Housen, Folkert Kuiken, & Ineke Vedder (eds.), Dimensions of L2 performance and proficiency Investigating complexity, accuracy and fluency in SLA, 21–46. Amsterdam: John Benjamins.10.1075/lllt.32.02bulSearch in Google Scholar

Bulté, Bram & Alex Housen. 2014. Conceptualising and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing 26. 42–65.10.1016/j.jslw.2014.09.005Search in Google Scholar

Carlsen, Cecilie. 2012. Proficiency level – a fuzzy variable in computer learner corpora. Applied Linguistics 33. 161–183.10.1093/applin/amr047Search in Google Scholar

Chapelle, Carol 2012. The TOEFL validity argument. In Carol Chapelle, Mary Enright & Joan Jamieson (eds.) Building a validity argument for the Test of English as a Foreign Language, 319–352. New York: Routledge.10.4324/9780203937891Search in Google Scholar

Charniak, Eugene. 2010. Top-down nearly-context-sensitive parsing. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 674–683. Stroudsburg, PA: Association for Computational Linguistics.Search in Google Scholar

Crossley, Scott & Danielle McNamara. 2014. Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing 26. 66–79.10.1016/j.jslw.2014.09.006Search in Google Scholar

Council of Europe. 2001. Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge: CUP.Search in Google Scholar

Council of Europe. 2004. Relating language examinations to the Common European Framework of Reference for Languages: learning, teaching, assessment. Manual. Strasbourg: Language Policy Division.Search in Google Scholar

Engelhard, George. 1994. Examining rater errors in the assessment of written composition with a many-faceted Rasch model. Journal of Educational Measurement 31. 93–112.10.1111/j.1745-3984.1994.tb00436.xSearch in Google Scholar

Graesser, Arthur, Danielle McNamara, Max Louwerse & Zhiqiang Cai. 2004. Coh-Metrix: Analysis of text on cohesion and language. Behaviour Research Methods, Instruments, & Computers 36. 193–202.Search in Google Scholar

Green, Anthony. 2012. Language functions revisited: Theoretical and empirical bases for language construct definition across the ability range. Cambridge: CUP.Search in Google Scholar

Gyllstad, Henrik, Jonas Granfeldt, Petra Bernardini & Marie Källkvist. 2014. Linguistic correlates to communicative proficiency levels of the CEFR: The case of syntactic complexity in written L2 English, L3 French and L4 Italian. In Leah Roberts, Ineke Vedder & Jan Hulstijn (eds.)EUROSLA Yearbook 14. 1–30. Amsterdam: John Benjamins.10.1075/eurosla.14.01gylSearch in Google Scholar

Hawkins, John & Luna Filipović. 2012. Criterial features in L2 English: Specifying the reference levels of the Common European Framework. Cambridge:CUP.Search in Google Scholar

Hulstijn, Jan. 2007. The shaky ground beneath the CEFR: Quantitative and qualitative dimensions of language proficiency. The Modern Language Journal 91. 663–667.10.1111/j.1540-4781.2007.00627_5.xSearch in Google Scholar

Hulstijn, Jan, J. Charles Alderson & Rob Schoonen. 2010. Developmental stages in second-language acquisition and levels of second-language proficiency: Are there links between them. In Ineke Bartning, Maisa Martin & Ineke Vedder (eds.), Communicative proficiency and linguistic development: Intersections between SLA and language testing research, 11–20. EuroSLA.Search in Google Scholar

Hawkins, John & Luna Filipović. 2012. Criterial features in L2 English: Specifying the reference levels of the Common European Framework. Cambridge: CUP.Search in Google Scholar

Khushik Ghulam Abbas, Ari Huhta, Investigating Syntactic Complexity in EFL Learners’ Writing across Common European Framework of Reference Levels A1, A2, and B1, Applied Linguistics, Volume 41, Issue 4, August 2020, Pages 506–532, https://doi.org/10.1093/applin/amy06410.1093/applin/amy064Search in Google Scholar

Kyle, Kristopher & Scott Crossley. 2017. Assessing syntactic sophistication in L2 writing: A usage-based approach. Language Testing 34. 513–535. 10.1177/0265532217712554Search in Google Scholar

Linacre, Michael. 2009. A user’s guide to FACETS v 3.66.0. Chicago: Winsteps.Search in Google Scholar

Lu, Xiaofei. 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15. 474–496.10.1075/ijcl.15.4.02luSearch in Google Scholar

Lu, Xiaofei. 2011. A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly 45. 36–62.10.5054/tq.2011.240859Search in Google Scholar

Martínez, Ana Lahuerta. 2018. Analysis of syntactic complexity in secondary education EFL writers at different proficiency levels. Assessing Writing 35. 1–11.10.3726/b14561Search in Google Scholar

McCarthy, Philip, Rebekah Guess & Danielle McNamara. 2009. The components of paraphrase evaluations. Behavioural Research Methods 41. 682–690.10.3758/BRM.41.3.682Search in Google Scholar

McNamara, Danielle, Arthur Graesser, Philip McCarthy & Zhiqiang Cai. 2014. Automated evaluation of text and discourse with Coh-Metrix. Cambridge: CUP.10.1017/CBO9780511894664Search in Google Scholar

McNamara, Tim & Ute Knoch. 2012. The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing 29. 555–576.10.1177/0265532211430367Search in Google Scholar

McNamara, Danielle, Arthur Graesser, Philip McCarthy & Zhiqiang Cai. 2014. Automated evaluation of text and discourse with Coh-Metrix. Cambridge: CUP.10.1017/CBO9780511894664Search in Google Scholar

Ortega, Lourdes. 2003. Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24. 492–518.10.1093/applin/24.4.492Search in Google Scholar

van den Berg, Huub, Sven De Maeyer, Daphne van Weijen & Marion Tillema. 2012. Generalizability of text quality scores. In Elke Van Steendam, Marion Tillema, Gert Rijlaarsdam & Huub van den Bergh (eds.), 23–32. Leiden: Brill.Search in Google Scholar

Verspoor, Marjolijn, Monika Schmid & Xiaoyan Xu. 2012. A dynamic usage-based perspective on L2 writing. Journal of Second Language Writing 21. 239–263.10.1016/j.jslw.2012.03.007Search in Google Scholar

Wiśniewski, Katrin. 2017. Empirical learner language and the levels of the Common European Framework of Reference. Language Learning 67. 232–253.10.1111/lang.12223Search in Google Scholar

Wolfe-Quintero, Kate, Shunji Inagaki & Hae-Young Kim. 1998. Second language development in writing: Measures of fluency, accuracy, and complexity. University of Hawaii Press.Search in Google Scholar

Appendix 1

Table A

Descriptive statistics for the count variables across CEFR levels: grade 8

Index (Number of ...) A1

(n=37)
A2

(n=87)
B1

(n=70)
B2

(n=8)
M SD M SD M SD M SD
Words 29.11 14.14 55.98 14.77 72.46 13.25 88.81 16.98
Sentences 3.44 1.60 5.29 1.85 6.18 1.76 7.56 1.52
Clauses 5.08 2.47 9.34 2.69 11.42 2.44 13.31 2.83
T-Units 3.54 1.88 5.67 1.93 6.86 1.72 8.40 1.28
Verb Phrases 5.33 2.83 10.04 2.83 12.75 2.72 15.35 2.34
Dependent clauses 1.31 0.77 2.89 1.17 3.68 1.31 4.17 1.57
Complex T-units 1.17 0.69 2.24 0.93 2.80 0.91 3.46 1.14
Coordinate phrases 0.45 0.43 0.97 0.68 1.00 0.68 0.90 0.67
Complex nominals 1.72 1.18 3.20 1.21 4.50 1.67 5.71 3.34
Table B

Descriptive statistics for the count variables across CEFR levels: Gymnasium

Index (Number of ...) A2

(n=31)
B1

(n=125)
B2

(n=39)
M SD M SD M SD
Words 70.19 20.16 87.16 16.65 110.57 22.73
Sentences 5.85 1.94 6.62 1.67 8.18 4.67
Clauses 10.15 3.11 11.67 2.66 13.34 2.93
T-Units 6.62 2.29 7.65 1.96 8.56 1.75
Verb Phrases 11.6 3.81 14.25 3.22 17.40 3.72
Dependent clauses 3.59 1.62 4.18 1.65 5.01 1.87
Complex T-units 2.78 1.16 3.22 1.08 3.96 1.23
Coordinate Phrases 1.45 0.99 1.79 0.87 2.27 1.24
Complex nominals 6.44 2.33 7.59 1.91 10.52 3.12
Table C

Descriptive statistics for the syntactic complexity indices from L2SCA across CEFR levels: grade 8

Index A1

(n=37)
A2

(n=87)
B1

(n=70)
B2

(n=8)
M SD M SD M SD M SD
Sentence length 8.58 2.01 11.49 2.95 12.66 3.08 12.71 4.49
T-unit length 8.53 2.35 11.08 2.97 11.42 2.43 11.02 3.11
Clause length 5.97 1.44 6.38 0.92 6.59 0.81 6.99 0.79
Clauses per sentence 1.52 0.48 1.88 0.51 1.98 0.57 1.89 0.62
Clauses per T-unit 1.44 0.38 1.77 0.47 1.76 0.41 1.62 0.41
Complex T-units per T-unit 0.35 0.21 0.44 0.20 0.44 0.17 0.43 0.19
Dependent clauses per clause 0.24 0.14 0.30 0.11 0.32 0.09 0.30 0.08
Dependent clauses per T-unit 0.45 0.34 0.62 0.36 0.63 0.29 0.52 0.22
Coordinate phrases per clause 0.09 0.10 0.12 0.10 0.09 0.07 0.07 0.06
Coordinate phrases per T-unit 0.13 0.14 0.22 0.18 0.18 0.15 0.11 0.09
T-units per sentence 1.01 0.22 1.06 0.15 1.12 0.15 1.13 0.13
Complex nominals per clause 0.41 0.22 0.41 0.16 0.43 0.16 0.42 0.22
Complex nominals per T-unit 0.60 0.38 0.74 0.37 0.80 0.37 0.73 0.48
Verb phrases per T-unit 1.52 0.45 1.92 0.54 1.97 0.43 1.88 0.44
Table D

Descriptive statistics for the syntactic complexity indices from L2SCA across CEFR levels: Gymnasium

Index A2

(n=31)
B1

(n=125)
B2

(n=39)
M SD M SD M SD
Sentence length 13.06 3.59 14.24 3.78 15.44 2.49
T-unit length 12.01 3.14 12.73 3.45 13.88 2.17
Clause length 7.42 1.16 7.91 1.32 8.91 1.75
Clauses per sentence 1.84 0.47 1.88 0.48 1.89 0.49
Clauses per T-unit 1.67 0.39 1.66 0.41 1.65 0.32
Complex T-units per T-unit 0.50 0.19 0.49 0.17 0.51 0.15
Dependent clauses per clause 0.34 0.12 0.37 0.11 0.40 0.14
Dependent clauses per T-unit 0.66 0.33 0.68 0.42 0.68 0.25
Coordinate phrases per clause 0.16 0.11 0.17 0.10 0.18 0.10
Coordinate phrases per T-unit 0.25 0.17 0.26 0.17 0.28 0.16
T-units per sentence 1.12 0.18 1.15 0.14 1.14 0.10
Complex nominals per clause 0.71 0.27 0.71 0.23 0.87 0.28
Complex nominals per T-unit 1.16 0.46 1.16 0.41 1.34 0.34
Verb phrases per T-unit 2.01 0.53 2.08 0.60 2.18 0.35
Table E

Descriptive statistics for the syntactic complexity indices from Coh-Metrix across CEFR levels: grade 8

Index A1

(n=37)
A2

(n=87)
B1

(n=70)
B2

(n=8)
M SD M SD M SD M SD
Sentence length (st.dev.) 2.80 1.81 5.06 2.21 6.37 2.33 5.96 3.01
Syntactic simplicity (z-score) 0.57 0.92 0.29 0.76 0.21 0.73 0.46 1.21
Syntactic simplicity (percentile) 61.32 21.56 57.62 21.44 56.55 20.35 63.05 31.61
Left embeddedness 1.48 0.68 1.97 1.04 2.00 0.71 1.97 0.47
Modifiers per noun phrase 0.45 0.15 0.42 0.11 0.43 0.09 0.43 0.12
Minimal edit distance for parts of speech 0.36 0.26 0.53 0.20 0.63 0.11 0.64 0.14
Sentence syntax similarity (adjacent sentences) 0.14 0.08 0.11 0.05 0.10 0.04 0.11 0.04
Noun phrase density 379.86 59.27 371.30 34.22 353.00 25.53 356.43 26.83
Verb phrase density 229.27 45.49 251.37 30.78 258.22 25.07 255.82 20.95
Adverbial phrase density 23.43 31.90 28.74 13.82 33.52 15.16 38.54 18.25
Preposition phrase density 74.23 30.06 86.51 20.86 86.37 17.75 93.37 16.06
Negation density 37.80 18.59 32.42 14.14 32.05 12.41 27.96 9.63
Gerund density 8.05 13.68 6.57 8.48 11.08 8.75 7.82 6.66
Infinitive density 4.78 7.49 14.55 9.48 22.32 10.69 18.19 11.67
Table F

Descriptive Statistics for the Syntactic Complexity Indices from Coh-Metrix across CEFR levels: Gymnasium

Index A2

(n=31)
B1

(n=125)
B2

(n=39)
M SD M SD M SD
Sentence length (st.dev.) 5.18 1.74 6.75 3.61 7.64 2.34
Syntactic simplicity (z-score) 0.16 0.71 -0.13 0.69 -0.16 0.53
Syntactic simplicity (percentile) 55.18 19.98 46.05 20.31 44.42 17.24
Left embeddedness 2.35 1.11 2.44 0.84 3.09 1.02
Modifiers per noun phrase 0.51 0.11 0.56 0.12 0.66 0.15
Minimal edit distance for parts of speech 0.62 0.11 0.63 0.10 0.67 0.06
Sentence syntax similarity (adja-cent sentences) 0.13 0.04 0.10 0.03 0.09 0.02
Noun phrase density 359.89 30.80 346.15 28.32 340.90 24.50
Verb phrase density 241.01 29.03 239.40 27.54 238.67 28.95
Adverbial phrase density 35.37 15.47 42.96 14.45 42.39 9.63
Preposition phrase density 82.78 27.61 91.37 18.29 97.13 17.86
Negation density 24.55 12.84 21.07 10.34 19.19 9.54
Gerund density 13.60 11.36 16.11 9.43 19.92 10.67
Infinitive density 15.96 13.93 17.20 9.67 20.52 10.65
Published Online: 2022-03-09
Published in Print: 2022-03-04

© 2022 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 4.10.2023 from https://www.degruyter.com/document/doi/10.1515/eujal-2021-0011/html
Scroll to top button