From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA

Václav CvrčekORCID iD: http://orcid.org/0000-0003-3977-2393
  • Corresponding author
  • Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague, Czech Republic
  • orcid.org/0000-0003-3977-2393
  • Email
  • Further information
  • Václav Cvrček (PhD Charles University Prague, 2008) is an Associate Professor of corpus linguistics at Charles University, Prague. His main research interests are corpus linguistics methodology, quantitative linguistics, and corpus-assisted discourse studies.
  • Search for other articles:
  • degruyter.comGoogle Scholar
, Zuzana KomrskováORCID iD: http://orcid.org/0000-0002-1170-9344
  • Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague, Czech Republic
  • orcid.org/0000-0002-1170-9344
  • Email
  • Further information
  • Zuzana Komrsková holds an MA in Czech language and she is currently a PhD student at the Institute of Czech Language and Theory of Communication, Charles University, Prague. Her special areas of interest are Internet communication, corpus linguistics, and corpora of spoken language.
  • Search for other articles:
  • degruyter.comGoogle Scholar
, David LukešORCID iD: http://orcid.org/0000-0003-0429-6542
  • Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague, Czech Republic
  • orcid.org/0000-0003-0429-6542
  • Email
  • Further information
  • David Lukeš holds an MA in Phonetics and English & American Studies; he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His research interests include Czech phonetics, corpora of spoken language, and quantitative methods in linguistics.
  • Search for other articles:
  • degruyter.comGoogle Scholar
, Petra PoukarováORCID iD: http://orcid.org/0000-0003-3707-6466
  • Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague, Czech Republic
  • orcid.org/0000-0003-3707-6466
  • Email
  • Further information
  • Petra Poukarová holds an MA in Czech Language and Literature; she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her research focuses on Czech phonetics, corpus linguistics, and corpora of spoken language.
  • Search for other articles:
  • degruyter.comGoogle Scholar
, Anna ŘehořkováORCID iD: http://orcid.org/0000-0002-6676-317X
  • Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague, Czech Republic
  • orcid.org/0000-0002-6676-317X
  • Email
  • Further information
  • Anna Řehořková holds an MA in Czech language and she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her academic areas of interest are diachronic linguistics, Czech subordinate clauses, and corpus linguistics.
  • Search for other articles:
  • degruyter.comGoogle Scholar
and Adrian Jan ZasinaORCID iD: http://orcid.org/0000-0001-9348-5833
  • Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague, Czech Republic
  • orcid.org/0000-0001-9348-5833
  • Email
  • Further information
  • Adrian Jan Zasina holds an MA in Slavic Studies and he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His main research interests include data-driven learning, gender & language, and corpus linguistics.
  • Search for other articles:
  • degruyter.comGoogle Scholar

Abstract

This paper is part of a larger research effort on language variability aimed at uncovering the relations between extra- and intratextual characteristics of Czech texts by means of multi-dimensional analysis. The palpable lack of prior art on quantitative register analysis of Czech led to several distinctive methodological decisions, concerning namely corpus design, feature selection and the parameters of factor analysis, especially the number of dimensions to extract. We report on these for their potential relevance to other researchers embarking on a similar journey. In order to demonstrate the viability of the model, we also present a brief interpretation of the resulting dimensions.

  • Auer, Peter. 2009. On-line syntax: Thoughts on the temporality of spoken language. Language Sciences 31. 1–13.

  • Bermel, Neil. 2014. Czech diglossia: Dismantling or dissolution? In Judit Árokay, Jadranka Gvozdanović & Darja Miyajima (eds.), Divided languages? Diglossia, translation and the rise of modernity in Japan, China and the Slavic World, 21–37. Dordrecht: Springer.

  • Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.

  • Biber, Douglas. 1990. Methodological issues regarding corpus-based analyses of linguistic variation. Literary and Linguistic Computing 5(4). 257–269.

  • Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.

  • Biber, Douglas. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1). 7–34.

  • Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.

  • Biber, Douglas, Mark Davies, James K. Jones & Nicole Tracy-Ventura. 2006. Spoken and written register variation in Spanish: A multi-dimensional analysis. Corpora 1(1). 1–37.

  • Biber, Douglas & Jesse Egbert. 2016. Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics 44(2). 95–137.

  • Čechová, Marie, Marie Krčmová & Eva Minářová (eds.). 2008. Současná stylistika [Contemporary stylistics]. Prague: Nakladatelství Lidové noviny.

  • Čermák, František. 2014. Lexis in spoken and written language. In František Čermák (ed.), Jazyk a slovník. Vybrané lingvistické studie [Language and dictionary. Selected studies in linguistics], 299–304. Prague: Karolinum.

  • Čmejrková, Světla & Jana Hoffmannová (eds.). 2011. Mluvená čeština: hledání funkčního rozpětí [Spoken Czech: In search of the range of its functions]. Prague: Academia.

  • Cvrček, Václav & Lucie Chlumská. 2015. Simplification in translated Czech: A new approach to type-token ratio. Russian Linguistics 39(3). 309–325.

  • Cvrček, Václav, Vilém Kodýtek, Marie Kopřivová, Dominika Kováříková, Petr Sgall, Michal Šulc, Jan Táborský, Jan Volín & Waclawičová Martina. 2010. Mluvnice současné češtiny 1 [A grammar of contemporary Czech 1]. Prague: Karolinum.

  • Cvrček, Václav, Zuzana Komrsková, David Lukeš, Petra Poukarová, Anna Řehořková & Adrian J. Zasina. Forthcoming. Variabilita češtiny: multidimenzionální analýza [Variability in Czech: A multi-dimensional analysis]. Slovo a slovesnost.

  • Egbert, Jesse & Erin Schnur. 2018. The role of the text in corpus and discourse analysis. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse. A critical review, 158–170. New York: Routledge.

  • Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.

  • Hoffmannová, Jana, Jiří Homoláč, Eliška Chvalovská, Lucie Jílková, Petr Kaderka, Petr Mareš & Kamila Mrázková (eds.). 2016. Stylistika mluvené a psané češtiny [The stylistics of spoken and written Czech]. Prague: Academia.

  • Karlík, Petr, Marek Nekula & Zdenka Rusínová (eds.). 1995. Příruční mluvnice češtiny [A reference grammar of Czech]. Prague: Nakladatelství Lidové noviny.

  • Kodýtek, Vilém. 2008. Variace v mluvené češtině v Čechách: sonda do ORAL2006. [Variation in spoken Czech in Bohemia: Exploring the ORAL2006 corpus]. In Marie Kopřivová & Martina Waclawičová (eds.), Čeština v mluveném korpusu, 132–141. Prague: Nakladatelství Lidové noviny.

  • Kodýtek, Vilém. Unpublished. A translation of Biber’s three-dimensional model of English into Czech. https://www.korpus.cz/biblio/2722

  • Lee, David Y. W. 2001. Genres, registers, text types, domains and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning & Technology 5(3). 37–72.

  • Miller, Jim & Regina Weinert. 1998. Spontaneous spoken language. Oxford: Clarendon Press Oxford.

  • Mistrík, Jozef. 1989. Štylistika [Stylistics]. Bratislava: Slovenské pedagogické nakladateľstvo.

  • Oakes, Michael P. 1998. Statistics for corpus linguistics. Edinburgh: Edinburgh University Press.

  • Petr, Jan, Miloš Dokulil, Karel Horálek, Jiřina Hůrková & Knappová Miloslava. 1986. Mluvnice češtiny 1 [A grammar of Czech 1]. Prague: Academia.

  • Popescu, Ioan-Iovitz, Karl-Heinz Best & Gabriel Altmann. 2007. On the dynamics of word classes in texts. Glottometrics 14. 58–71.

  • R Core Team. 2017. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. https://www.r-project.org/.

  • Revelle, William 2017. psych: procedures for personality and psychological research v1.7.8. Evanston: Northwestern University. https://cran.r-project.org/package=psych

  • Sgall, Petr, Jiří Hronek, Alexandr Stich & Ján Horecký (eds.). 1992. Variation in language. Code switching in Czech as a challenge for sociolinguistics. Amsterdam & Philadelphia: John Benjamins.

  • Zasina, Adrian J., David Lukeš, Zuzana Komrsková, Petra Poukarová & Anna Řehořková. 2018. Koditex (A corpus of diversified texts). Prague: Institute of the Czech National Corpus, Faculty of Arts, Charles University.

Purchase article
Get instant unlimited access to the article.
$42.00
Price including VAT
Log in
Already have access? Please log in.


Journal + Issues

Corpus Linguistics and Linguistic Theory publishes high-quality, corpus-based research focusing on theoretically-relevant issues in all core areas of linguistic research (phonology, morphology, syntax, semantics, pragmatics) and other recognized topic areas. The journal features articles from a corpus-based approach that develop new methods, evaluate theoretical claims and offer analyses of linguistic phenomena within a theoretical framework.

Search