Václav Cvrček (PhD Charles University Prague, 2008) is an Associate Professor of corpus linguistics at Charles University, Prague. His main research interests are corpus linguistics methodology, quantitative linguistics, and corpus-assisted discourse studies.
Zuzana Komrsková holds an MA in Czech language and she is currently a PhD student at the Institute of Czech Language and Theory of Communication, Charles University, Prague. Her special areas of interest are Internet communication, corpus linguistics, and corpora of spoken language.
David Lukeš holds an MA in Phonetics and English & American Studies; he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His research interests include Czech phonetics, corpora of spoken language, and quantitative methods in linguistics.
Petra Poukarová holds an MA in Czech Language and Literature; she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her research focuses on Czech phonetics, corpus linguistics, and corpora of spoken language.
Anna Řehořková holds an MA in Czech language and she is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. Her academic areas of interest are diachronic linguistics, Czech subordinate clauses, and corpus linguistics.
Adrian Jan Zasina holds an MA in Slavic Studies and he is currently a PhD student at the Institute of the Czech National Corpus, Charles University, Prague. His main research interests include data-driven learning, gender & language, and corpus linguistics.
This paper is part of a larger research effort on language variability aimed at uncovering the relations between extra- and intratextual characteristics of Czech texts by means of multi-dimensional analysis. The palpable lack of prior art on quantitative register analysis of Czech led to several distinctive methodological decisions, concerning namely corpus design, feature selection and the parameters of factor analysis, especially the number of dimensions to extract. We report on these for their potential relevance to other researchers embarking on a similar journey. In order to demonstrate the viability of the model, we also present a brief interpretation of the resulting dimensions.
Auer, Peter. 2009. On-line syntax: Thoughts on the temporality of spoken language. Language Sciences 31. 1–13.
Bermel, Neil. 2014. Czech diglossia: Dismantling or dissolution? In Judit Árokay, Jadranka Gvozdanović & Darja Miyajima (eds.), Divided languages? Diglossia, translation and the rise of modernity in Japan, China and the Slavic World, 21–37. Dordrecht: Springer.
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, Douglas. 1990. Methodological issues regarding corpus-based analyses of linguistic variation. Literary and Linguistic Computing 5(4). 257–269.
Biber, Douglas. 1995. Dimensions of register variation: A cross-linguistic comparison. Cambridge: Cambridge University Press.
Biber, Douglas. 2014. Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast 14(1). 7–34.
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style. Cambridge: Cambridge University Press.
Biber, Douglas, Mark Davies, James K. Jones & Nicole Tracy-Ventura. 2006. Spoken and written register variation in Spanish: A multi-dimensional analysis. Corpora 1(1). 1–37.
Biber, Douglas & Jesse Egbert. 2016. Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics 44(2). 95–137.
Čechová, Marie, Marie Krčmová & Eva Minářová (eds.). 2008. Současná stylistika [Contemporary stylistics]. Prague: Nakladatelství Lidové noviny.
Čermák, František. 2014. Lexis in spoken and written language. In František Čermák (ed.), Jazyk a slovník. Vybrané lingvistické studie [Language and dictionary. Selected studies in linguistics], 299–304. Prague: Karolinum.
Čmejrková, Světla & Jana Hoffmannová (eds.). 2011. Mluvená čeština: hledání funkčního rozpětí [Spoken Czech: In search of the range of its functions]. Prague: Academia.
Cvrček, Václav & Lucie Chlumská. 2015. Simplification in translated Czech: A new approach to type-token ratio. Russian Linguistics 39(3). 309–325.
Cvrček, Václav, Vilém Kodýtek, Marie Kopřivová, Dominika Kováříková, Petr Sgall, Michal Šulc, Jan Táborský, Jan Volín & Waclawičová Martina. 2010. Mluvnice současné češtiny 1 [A grammar of contemporary Czech 1]. Prague: Karolinum.
Cvrček, Václav, Zuzana Komrsková, David Lukeš, Petra Poukarová, Anna Řehořková & Adrian J. Zasina. Forthcoming. Variabilita češtiny: multidimenzionální analýza [Variability in Czech: A multi-dimensional analysis]. Slovo a slovesnost.
Egbert, Jesse & Erin Schnur. 2018. The role of the text in corpus and discourse analysis. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse. A critical review, 158–170. New York: Routledge.
Halliday, Michael A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London: Longman.
Hoffmannová, Jana, Jiří Homoláč, Eliška Chvalovská, Lucie Jílková, Petr Kaderka, Petr Mareš & Kamila Mrázková (eds.). 2016. Stylistika mluvené a psané češtiny [The stylistics of spoken and written Czech]. Prague: Academia.
Karlík, Petr, Marek Nekula & Zdenka Rusínová (eds.). 1995. Příruční mluvnice češtiny [A reference grammar of Czech]. Prague: Nakladatelství Lidové noviny.
Kodýtek, Vilém. 2008. Variace v mluvené češtině v Čechách: sonda do ORAL2006. [Variation in spoken Czech in Bohemia: Exploring the ORAL2006 corpus]. In Marie Kopřivová & Martina Waclawičová (eds.), Čeština v mluveném korpusu, 132–141. Prague: Nakladatelství Lidové noviny.
Sgall, Petr, Jiří Hronek, Alexandr Stich & Ján Horecký (eds.). 1992. Variation in language. Code switching in Czech as a challenge for sociolinguistics. Amsterdam & Philadelphia: John Benjamins.
Zasina, Adrian J., David Lukeš, Zuzana Komrsková, Petra Poukarová & Anna Řehořková. 2018. Koditex (A corpus of diversified texts). Prague: Institute of the Czech National Corpus, Faculty of Arts, Charles University.
Corpus Linguistics and Linguistic Theory publishes high-quality, corpus-based research focusing on theoretically-relevant issues in all core areas of linguistic research (phonology, morphology, syntax, semantics, pragmatics) and other recognized topic areas. The journal features articles from a corpus-based approach that develop new methods, evaluate theoretical claims and offer analyses of linguistic phenomena within a theoretical framework.