Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter November 6, 2021

From Keyness to Distinctiveness – Triangulation and Evaluation in Computational Literary Studies

Julian Schröter EMAIL logo , Keli Du , Julia Dudar , Cora Rok and Christof Schöch


There is a set of statistical measures developed mostly in corpus and computational linguistics and information retrieval, known as keyness measures, which are generally expected to detect textual features that account for differences between two texts or groups of texts. These measures are based on the frequency, distribution, or dispersion of words (or other features). Searching for relevant differences or similarities between two text groups is also an activity that is characteristic of traditional literary studies, whenever two authors, two periods in the work of one author, two historical periods or two literary genres are to be compared. Therefore, applying quantitative procedures in order to search for differences seems to be promising in the field of computational literary studies as it allows to analyze large corpora and to base historical hypotheses on differences between authors, genres and periods on larger empirical evidence. However, applying quantitative procedures in order to answer questions relevant to literary studies in many cases raises methodological problems, which have been discussed on a more general level in the context of integrating or triangulating quantitative and qualitative methods in mixed methods research of the social sciences. This paper aims to solve these methodological issues concretely for the concept of distinctiveness and thus to lay the methodological foundation permitting to operationalize quantitative procedures in order to use them not only as rough exploratory tools, but in a hermeneutically meaningful way for research in literary studies.

Based on a structural definition of potential candidate measures for analyzing distinctiveness in the first section, we offer a systematic description of the issue of integrating quantitative procedures into a hermeneutically meaningful understanding of distinctiveness by distinguishing its epistemological from the methodological perspective. The second section develops a systematic strategy to solve the methodological side of this issue based on a critical reconstruction of the widespread non-integrative strategy in research on keyness measures that can be traced back to Rudolf Carnap’s model of explication. We demonstrate that it is, in the first instance, mandatory to gain a comprehensive qualitative understanding of the actual task. We show that Carnap’s model of explication suffers from a shortcoming that consists in ignoring the need for a systematic comparison of what he calls the explicatum and the explicandum. Only if there is a method of systematic comparison, the next task, namely that of evaluation can be addressed, which verifies whether the output of a quantitative procedure corresponds to the qualitative expectation that must be clarified in advance. We claim that evaluation is necessary for integrating quantitative procedures to a qualitative understanding of distinctiveness. Our reconstruction shows that both steps are usually skipped in empirical research on keyness measures that are the most important point of reference for the development of a measure of distinctiveness. Evaluation, which in turn requires thorough explication and conceptual clarification, needs to be employed to verify this relation.

In the third section we offer a qualitative clarification of the concept of distinctiveness by spanning a three-dimensional conceptual space. This flexible framework takes into account that there is no single and proper concept of distinctiveness but rather a field of possible meanings depending on research interest, theoretical framework, and access to the perceptibility or salience of textual features. Therefore, we shall, instead of stipulating any narrow and strict definition, take into account that each of these aspects – interest, theoretical framework, and access to perceptibility – represents one dimension of the heuristic space of possible uses of the concept of distinctiveness.

The fourth section discusses two possible strategies of operationalization and evaluation that we consider to be complementary to the previously provided clarification, and that complete the task of establishing a candidate measure successfully as a measure of distinctiveness in a qualitatively ambitious sense. We demonstrate that two different general strategies are worth considering, depending on the respective notion of distinctiveness and the interest as elaborated in the third section. If the interest is merely taxonomic, classification tasks based on multi-class supervised machine learning are sufficient. If the interest is aesthetic, more complex and intricate evaluation strategies are required, which have to rely on a thorough conceptual clarification of the concept of distinctiveness, in particular on the idea of salience or perceptibility. The challenge here is to correlate perceivable complex features of texts such as plot, theme (aboutness), style, form, or roles and constellation of fictional characters with the unperceived frequency and distribution of word features that are calculated by candidate measures of distinctiveness. Existing research did not clarify, so far, how to correlate such complex features with individual word features.

The paper concludes with a general reflection on the possibility of mixed methods research for computational literary studies in terms of explanatory power and exploratory use. As our strategy of combining explication and evaluation shows, integration should be understood as a strategy of combining two different perspectives on the object area: in our evaluation scenarios, that of empirical reader response and that of a specific quantitative procedure. This does not imply that measures of distinctiveness, which proved to reach explanatory power in one qualitative aspect, should be supposed to be successful in all fields of research. As long as evaluation is omitted, candidate measures of distinctiveness lack explanatory power and are limited to exploratory use. In contrast with a skepticism that has sometimes been expressed from literary scholars with regard to the relevance of computational literary studies on proper issues of the humanities, we believe that integrating computational methods into hermeneutic literary studies can be achieved in a way that reaches higher explanatory power than the usual exploratory use of keyness measures, but it can only be achieved individually for concrete tasks and not once and for all based on a general theoretical demonstration.


Baker, Paul, Querying keywords. Questions of Difference, Frequency and Sense in Keyword Analysis, Journal of English Linguistics 32:4 (2004), 346–359.10.1177/0075424204269894Search in Google Scholar

Blei, David M., Probabilistic Topic Models, Communications of the ACM 55:4 (2012), 77–84.10.1145/2107736.2107741Search in Google Scholar

Bondi, Marina, Perspectives on Keywords and Keyness. An Introduction, in: Marina Bondi/Mike Scott (eds.), Keyness in Texts, Amsterdam/Philadelphia 2010, 1–18.10.1075/scl.41.01bonSearch in Google Scholar

Bruza, P.D. et al., Aboutness from a Commonsense Perspective, Journal of the American Society for Information Science 51:12 (2000), 1090–1105.10.1002/1097-4571(2000)9999:9999<::AID-ASI1026>3.0.CO;2-YSearch in Google Scholar

Burrows, John, ›Delta‹: a Measure of Stylistic Difference and a Guide to Likely Authorship, Literary and Linguistic Computing 17:3 (2002), 267–287.10.1093/llc/17.3.267Search in Google Scholar

Burrows, John, All the Way Through: Testing for Authorship in Different Frequency Strata, Literary and Linguistic Computing 22:1 (2007), 27–47.10.1093/llc/fqi067Search in Google Scholar

Carnap, Rudolf, Logical Foundations of Probability, Chicago/London/Toronto 1950.Search in Google Scholar

Da, Nan Z., The Computational Case against Computational Literary Studies, Critical Inquiry 45 (2019), 601–639.10.1086/702594Search in Google Scholar

Duncker, Axel, Gattungssystematiken, in: Rüdiger Zymner (ed.), Handbuch Gattungstheorie, Stuttgart/Weimar 2010, 12–15.Search in Google Scholar

Egbert, Jesse/Doug Biber, Incorporating Text Dispersion into Keyword Analyses, Corpora 14:1 (2019), 77–104.10.3366/cor.2019.0162Search in Google Scholar

Firth, John Rupert, The Technique of Semantics, Transactions of the Philological Society (1935), 36–72.10.1111/j.1467-968X.1935.tb01254.xSearch in Google Scholar

Fish, Steven, Are Muslims Distinctive? A look at the evidence, Oxford 2011.10.1093/acprof:osobl/9780199769209.001.0001Search in Google Scholar

Fishelov, David, Genre Theory and Family Resemblance – Revisited, Poetics 20:2 (1991), 123–138.10.1016/0304-422X(91)90002-7Search in Google Scholar

Føllesdal, Dagfinn, Hermeneutics and the hypothetico-deductive method, Dialectica 33:3–4 (1979), 319–336.10.1111/j.1746-8361.1979.tb00759.xSearch in Google Scholar

Fricke, Harald, Norm und Abweichung, München 1981.Search in Google Scholar

Fricke, Harald, Definitionen und Begriffsformen, in: Rüdiger Zymner (ed.), Handbuch Gattungstheorie, Stuttgart/Weimar 2010, 7–10.Search in Google Scholar

Gabrielatos, Costas, Keyness Analysis: Nature, Metrics and Techniques, in: Charlotte Taylor/Anna Marchi (eds.), Corpus Approaches to Discourse. A critical review, Oxford 2018, 225–258.10.4324/9781315179346-11Search in Google Scholar

Greene, Jennifer C., Is Mixed Methods Social Inquiry a Distinctive Methodology?, Journal of Mixed Methods Research 2:1 (2008), 7–22.10.1177/1558689807309969Search in Google Scholar

Gries, Stephan Th., Dispersions and Adjusted Frequencies in Corpora, International Journal of Corpus Linguistics 13:4 (2008), 403–437.10.1075/ijcl.13.4.02griSearch in Google Scholar

Gymnich, Marion/Birgit Neumann/Ansgar Nünning (eds.), Gattungstheorie und Gattungsgeschichte, Trier 2007.Search in Google Scholar

Hempfer, Klaus W., Zum begrifflichen Status der Gattungsbegriffe: Von ›Klassen‹ zu ›Familienähnlichkeiten‹ und ›Prototypen‹, Zeitschrift für französische Sprache und Literatur 120:1 (2010), 14–32.Search in Google Scholar

Herrmann, Berenike J./Karina van Dalen-Oskam/Christof Schöch, Revisiting Style, a Key Concept in Literary Studies, Journal of Literary Theory 9:1 (2015), 25–52.10.1515/jlt-2015-0003Search in Google Scholar

Jauß, Hans Robert, Literaturgeschichte als Provokation der Literaturwissenschaft, Konstanz 1967.Search in Google Scholar

Kelle, Udo, Die Integration qualitativer und quantitativer Forschung – theoretische Grundlagen von »Mixed Methods«, Kölner Zeitschrift für Soziologie und Sozialpsychologie 69:2 (2017), 39–61.10.1007/s11577-017-0451-4Search in Google Scholar

Kilgarriff, Adam, Comparing Corpora, International Journal of Corpus Linguistics 6:1 (2001), 97–133.10.1075/ijcl.6.1.05kilSearch in Google Scholar

Klimek, Sonja/Ralph Müller, Vergleich als Methode? Zur Empirisierung eines philologischen Verfahrens im Zeitalter der Digital Humanities, Journal of Literary Theory 9:1 (2015), 53–78.10.1515/jlt-2015-0004Search in Google Scholar

Lamping, Dieter, Handbuch der literarischen Gattungen, Stuttgart 2009.Search in Google Scholar

Lijfijt, Jefrey et al., Significance Testing of Word Frequencies in Corpora, Digital Scholarship in the Humanities (2014), 1–24.10.1093/llc/fqu064Search in Google Scholar

Lincoln, Yvonna S./Egon G. Guba, Paradigmatic Controversies, Contradictions, and Emerging Confluences, Revisited, in: Norman Denzin/Yvonna S. Lincoln (eds.), Handbook of Qualitative Research, Thousand Oaks, CA 52018, 108–150.Search in Google Scholar

Maron, M.E., On Indexing, Retrieval and the Meaning of About, Journal of the American Society for Information Science, 28:1 (1977), 38–43.10.1002/asi.4630280107Search in Google Scholar

Müller, Ralph, Kategorisieren, in: Rüdiger Zymner (ed.), Handbuch Gattungstheorie, Stuttgart/Weimar 2010, 21–23.Search in Google Scholar

Paquot, Magali/Yves Bestgen, Distinctive Words in Academic Writing: A Comparison of three Statistical Tests for Keyword Extraction, DIAL – Digital Access to Libraries,  &#x2006; (17.09.2021), 1–23 (originally published in Language and Computers 68 [2009], 247–269).Search in Google Scholar

Rácz, Péter, Salience in Sociolinguistics. A Quantitative Approach, Berlin/Boston 2013.10.1515/9783110305395Search in Google Scholar

Ryan, Marie L., Introduction: On the Why, What, and How of Generic Taxonomy, Poetics 10:2–3 (1981), 109–126.10.1016/0304-422X(81)90030-9Search in Google Scholar

Ryle, Gilbert, About, Analysis 1:1 (1933), 10–12.10.1093/analys/1.1.10Search in Google Scholar

Schmidt-Hidding, Wolfgang, Zur Methode wortvergleichender und wortgeschichtlicher Studien, in: Europäische Schlüsselwörter, Vol. I: Humor und Witz, ed. by Sprachwissenschaftlichen Colloquium (Bonn), München 1963, 18–33.Search in Google Scholar

Schröter, Julian, Gattungsgeschichte und ihr Gattungsbegriff am Beispiel der Novellen, Journal of Literary Theory 13:2 (2019), 227–257.10.1515/jlt-2019-0009Search in Google Scholar

Scott, Mike, PC Analysis of Key Words – and Key Key Words, System 25:1 (1997), 1–13.10.1016/S0346-251X(97)00011-0Search in Google Scholar

Scott, Mike, WordSmith Tools Manual. Version 3.0, Oxford 1998.Search in Google Scholar

Šklovskij, Viktor, Die Kunst als Verfahren [1917], in: Jurij Striedter (ed.), Russischer Formalismus, München 1969, 5–35.Search in Google Scholar

Stamatatos, Efstathios, A Survey of Modern Authorship Attribution Methods, Journal of the American Society for Information Science and Technology 60:3 (2009), 538–556.10.1002/asi.21001Search in Google Scholar

Strube, Werner, Sprachanalytisch-philosophische Typologie literaturwissenschaftlicher Begriffe, in: Christian Wagenknecht (ed.), Zur Terminologie der Literaturwissenschaft, Stuttgart 1989, 35–49.Search in Google Scholar

Stubbs, Michael, Three Concepts of Keywords, in: Marina Bondi/Mike Scott (eds.), Keyness in Texts. Corpus Linguistic Investigations, Amsterdam/Philadelphia 2010, 21–42.10.1075/scl.41.03stuSearch in Google Scholar

Swales, John, Genre Analysis. English in Academic and Research Setting, Cambridge 1990.Search in Google Scholar

Toolan, Michael, The Theory and Philosophy of Stylistics, in: Peter Stockwell/Sara Whiteley (eds.), Handbook of Stylistics, Cambridge 2014, 13–31.10.1017/CBO9781139237031.003Search in Google Scholar

Tukey, John W., Exploratory Data Analysis, London et al. 1977.Search in Google Scholar

Underwood, Ted, Distant Horizons. Digital Evidence and Literary Change, Chicago 2019.10.7208/chicago/9780226612973.001.0001Search in Google Scholar

Voßkamp, Wilhelm, Gattungen als literarisch-soziale Institutionen, in: Walter Hinck/Alexander von Bormann (eds.), Textsortenlehre – Gattungsgeschichte, Heidelberg 1977, 27–44.Search in Google Scholar

Walton, Kendall L., Categories of Art, Philosophical Review 79:3 (1970), 334–367.10.4324/9781315303673-102Search in Google Scholar

Warren, Martin, Identifying Aboutgrams in Engineering Texts, in: Marina Bondi/Mike Scott (eds.), Keyness in Texts, Amsterdam/Philadelphia 2010, 113–126.10.1075/scl.41.09warSearch in Google Scholar

Williams, Raymond, Keywords. A Vocabulary of Culture and Society [1976], revised edition, New York 1983.Search in Google Scholar

Published Online: 2021-11-06
Published in Print: 2021-12-31

© 2020 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 7.12.2022 from
Scroll Up Arrow