Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter Mouton January 12, 2016

Mass counts in World Englishes: A corpus linguistic study of noun countability in non-native varieties of English

  • Daniel Schmidtke EMAIL logo and Victor Kuperman


Research on the morpho-syntax of non-native varieties of English has reported a widespread presence of mass noun pluralization such as baggages, equipments and softwares. In this paper we conducted a corpus linguistic study in order to provide empirically substantiated answers to this claim. We examined the purported prevalence of noun countability in World Englishes in a 1.9 billion-token mega-corpus of global varieties of English. In a comparison of native and non-native varieties of English, first we algorithmically isolated nouns that are more frequently pluralized in non-native varieties. The results indicate a continuum of non-native English countability, along which mass nouns occupy the most extreme tail of inflated occurrences of noun pluralization. In an exploratory analysis, we then examined the similarity in noun countability behaviour across non-native varieties of English. This analysis revealed that geographically proximate countries in which non-native varieties of English are spoken are most similar in the extent to which they pluralize nouns. We argue that noun-countability is a phenomenon best viewed as a gradient that is also regionally dependent.

Funding statement: Funding: This work was supported by the Ontario Trillium Award and a Graduate fellowship awarded by the Lewis & Ruth Sherman Centre for Digital Scholarship to the first author. The second author’s contribution was supported in part by the SSHRC Insight Development grant 430-2012-0488, the NSERC Discovery grant 402395–2012, the Early Researcher Award from the Ontario Research Fund, and the NIH R01 HD 073288 (PI Julie A. Van Dyke).


Thanks are due to Paweł Mandera for contributions to computational analysis in initial stages of this project. Thanks are also due to Christopher Hall, Anna Moro and Ivona Kucerová for their valuable comments on earlier drafts of this work, and to the attendees of the Conference of the Canadian Linguistics Association, Brock University, ON, Canada where this work was presented in May, 2014 and also to the attendees of the 9th International Conference on the Mental Lexicon, Niagara-on-the-lake, ON, Canada, September 2014. Thanks are also due to the students of the Department of Languages and Linguistics at York St. John University, UK, for their valuable comments on the early stages of this work.

Appendix A

As stated in Statistical analyses, the main advantage of the LORIDP measure is that it shrinks the frequency of the words under comparison to a prior frequency in a large background corpus. This enables the researcher to detect differences between words that are very frequent. In order to verify that LORIDP does indeed correct for high-frequency items and also to support our implementation of the chosen measure, we compared the LORIDP measure to two competing measures that have also been used to discover the overpresentation of a word in one corpus as compared to another. For the set of 6,227 nouns that we identified in the data collection procedure, we computed the Damerau (1993) ratio of relative frequencies. This measure estimates the difference between the frequency of the plural form of noun w in the Outer Circle (i), and Inner Circle (j). The relative frequency ratio r is computed as,


where ni is the total number of noun tokens in the Inner Circle subdivision of the GloWbE corpus i, nj is the total number of noun tokens in the Outer Circle subdivision of the GloWbE corpus j.

We also computed a simple difference coefficient used by Johansson and Hofland (1982) and Leech and Fallon (1992), which is computed as,


To test whether LORIDP affords an advantage over other difference co-efficient measures, we separately correlated total word frequency of each word in our data set with each difference coefficient of the over- or underrepresentation of the plural form of the word in Outer Circle Englishes. As the LORIDP measure shrinks word frequency, it is therefore not expected to correlate as strongly with word frequency when compared to the other measures. If the total frequency does not correlate well with LORIDP, then this validates LORIDP as a measure that estimates the difference in pluralization across the Inner and Outer Circle while controlling for the effect of total frequency. As is reported in Table 3, the LORIDP measure yields the weakest correlation with word frequency. We take this observation as support for the adoption of the LORIDP measure in the present investigation.

Table 3:

Spearman’s ρ for the correlation between each difference metric and the summed frequency of the plural and singular form of nouns in GloWbE.

Difference coefficient metricρ
Damerau’s r–0.17
Johansson’s difference coefficient–0.19
Log-odds ratio informative Dirichlet prior (LORIDP)–0.07


Allan, K. 1980. Nouns and countability. Language, 56. 541–567.10.2307/414449Search in Google Scholar

Alsagoff, L. &C. L.Ho. 1998. The grammar of Singapore English. In Joseph A. Foley, Thiru Kandiah, Zhiming Bao, Anthea Fraser Gupta, Lubna Alsagoff, Ho Chee Lick, Lionel Wee, Ismail S. Talib, and Wendy Bokhorst-Heng (eds.), English in new cultural contexts: Reflections from Singapore, 127–151. Singapore: Singapore Institute of Management and Oxford University Press.Search in Google Scholar

Bale, A. & D.Barner. 2011. Mass-count distinction. In Aronoff, M. (ed.), Oxford Bibliographies Online 1–26. Oxford: Oxford University Press.10.1093/obo/9780199772810-0028Search in Google Scholar

Barner, D., S.Inagaki & P.Li. 2009. Language, thought, and real nouns. Cognition 111(3). 329–344.10.1016/j.cognition.2009.02.008Search in Google Scholar

Bautista, M. L. S. & A. B.Gonzalez. 2008. 8 Southeast Asian Englishes. In B. B. Kachru, Y. Kachru & C. L. Nelson (eds.), The Handbook of World Englishes, 130–144. Oxford: Blackwell Publishers.Search in Google Scholar

Björkman, B. 2008. English as the lingua franca of engineering: The morphosyntax of academic speech events. Nordic Journal of English Studies 7(3). 103–122.10.35360/njes.103Search in Google Scholar

Bond, F. & Vatikiotis-Bateson, C. 2002. Using an ontology to determine English countability. Proceedings of the 19th international conference on computational linguistics-volume 1 (1–7).10.3115/1072228.1072280Search in Google Scholar

Brysbaert, M. & B.New.2009. Moving beyond Kuera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41(4). 977–990.10.3758/BRM.41.4.977Search in Google Scholar

Brysbaert, M., B.New & E.Keuleers. 2012. Adding part-of-speech information to the SUBTLEX-US word frequencies. Behavior Research Methods 44(4). 991–997.10.3758/s13428-012-0190-4Search in Google Scholar

Cane, G. 1994. The English language in Brunei Darussalam. World Englishes 13(3). 351–360.10.1111/j.1467-971X.1994.tb00321.xSearch in Google Scholar

Crystal, D.1997. English as a global language Cambridge University Press, UK.Search in Google Scholar

Crystal, D.2008. Two thousand million? English Today 93. 3.10.1017/S0266078408000023Search in Google Scholar

Cysouw, M. 2013. Disentangling geography from genealogy. In P. Auer, M. Hilpert, A. Stukenbrock & B. Szmrecsanyi (eds.), Space in language and linguistics: Geographical, interactional, and cognitive perspectives, 21–37. Berlin: de Gruyter.10.1515/9783110312027.21Search in Google Scholar

Damerau, F. J. 1993. Generating and evaluating domain-oriented multi-word terms from texts. Information Processing & Management 29(4). 433–447.10.1016/0306-4573(93)90039-GSearch in Google Scholar

Davies, M. 2013. Corpus of Global Web-Based English: 1.9 billion words from speakers in 20 countries. (accessed 11 November 2013)Search in Google Scholar

Davies, M. & R.Fuchs. 2015. Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE). English World-Wide 36(1). 1–28.10.1075/eww.36.1.01davSearch in Google Scholar

Fieder, N., L.Nickels & B.Biedermann.2014. Representation and processing of mass and count nouns: A review. Frontiers in Psychology 5. 589.10.3389/fpsyg.2014.00589Search in Google Scholar

Gonzalez, A. 1983. When does an error become a feature of Philippine English. Varieties of English in Southeast Asia 150. 172.Search in Google Scholar

Gries, S. T.2014. Frequency tables: Tests, effect sizes, and explorations. In D. Glynn & J. Robinson (eds.), Corpus methods for semantics: Quantitative studies in polysemy and synonymy, 365–389. Amsterdam & Philadelphia: John Benjamins.10.1075/hcp.43.14griSearch in Google Scholar

Hall, C. J.2013. Cognitive contributions to plurilithic views of English and other languages. Applied linguistics 34(2). 211–231.10.1093/applin/ams042Search in Google Scholar

Hall, C. J.2014. Moving beyond accuracy: From tests of English to tests of ‘Englishing’. ELT Journal 68(4). 376–385.10.1093/elt/ccu016Search in Google Scholar

Hall, C. J., D.Schmidtke & J.Vickers. 2013. Countability in world Englishes. World Englishes 32(1). 1–22.10.1111/weng.12001Search in Google Scholar

Heuven, W. J. van, P.Mandera, E.Keuleers & M.Brysbaert. 2014. SUBTLEX-UK: A new and improved word frequency database for British English. The Quarterly Journal of Experimental Psychology 67(6). 1176–1190.10.1080/17470218.2013.850521Search in Google Scholar

Hofland, K. & S.Johansson. 1982. Word frequencies in British and American English. Bergen, Norway: Norwegian Computing Centre for the Humanities.Search in Google Scholar

Imai, M. & D.Gentner. 1997. A cross-linguistic study of early word meaning: Universal ontology and linguistic influence. Cognition 62(2). 169–200.10.1016/S0010-0277(96)00784-6Search in Google Scholar

Jackendoff, R. 1991. Parts and boundaries. Cognition 41(1). 9–45.10.1016/0010-0277(91)90031-XSearch in Google Scholar

Jenkins, J. 2003. World Englishes: A resource book for students. London: Psychology Press.Search in Google Scholar

Jurafsky, D., V.Chahuneau, B. R.Routledge & N. A.Smith. 2014. Narrative framing of consumer sentiment in online restaurant reviews. First Monday 19(4). Retrieved from http:// in Google Scholar

Kachru, B. B. 1985. Standards, codification and sociolinguistic realism: The English language in the Outer Circle. R. Quirk & H. G. Widdowson (eds.), English in the world: Teaching and learning the language and literatures, 11–30. Cambridge: Cambridge University Press.Search in Google Scholar

Kachru, B. B. 1990. World Englishes and applied linguistics. World Englishes 9(1). 3–20.10.1075/z.lkul2.19kacSearch in Google Scholar

Kachru, B. B. 1992. The other tongue: English across cultures. Chicago: University of Illinois Press.Search in Google Scholar

Kulkarni, R., S.Rothstein & A.Treves. 2013. A statistical investigation into the cross-linguistic distribution of mass and count nouns: Morphosyntactic and semantic perspectives. Biolinguistics 7. 132–168.10.5964/bioling.8959Search in Google Scholar

Langacker, R. W. 2008. Cognitive grammar: A basic introduction. New York, USA: Oxford University Press.10.1093/acprof:oso/9780195331967.001.0001Search in Google Scholar

Leitner, G. 1992. English as a pluricentric language. In Clyne, M. (ed.), Pluricentric languages: Differing norms in different nations, 62, 178–237. Berlin: Mouton de Gruyter.Search in Google Scholar

Leech, G., & Fallon, R. 1992. Computer corpora: What do they tell us about culture?. International Computer Archive of Modern English Journal, 16.Search in Google Scholar

Link, G. 1983. The logical analysis of plurals and mass terms: A lattice-theoretical approach. R. Bäuerle, C. Schwarze & A. von Stechow (eds.), Meaning, use and interpretation of language, 302–323. Berlin: de Gruyter.Search in Google Scholar

Mair, C.2013. The World System of Englishes: Accounting for the transnational importance of mobile and mediated vernaculars. English World-Wide 34(3). 253–278.10.1075/eww.34.3.01maiSearch in Google Scholar

Mair, C. 2015. Response to Davies and Fuchs. English World-Wide 36(1). 29–33.10.1075/eww.36.1.02maiSearch in Google Scholar

Makoni, S. & A.Pennycook. 2007. Disinventing and reconstituting languages, 62. Clevedon, UK: Multilingual Matters.10.21832/9781853599255Search in Google Scholar

McArthur, T. 2002. The Oxford Guide to World English. Oxford: Oxford University Press.Search in Google Scholar

McKay, S. & W. D.Bokhorst-Heng. 2008. International English in its sociolinguistic contexts: Towards a socially sensitive EIL pedagogy. London: Routledge.Search in Google Scholar

Mesthrie, R. &, R. M. Bhatt 2008. World Englishes: The study of new linguistic varieties. Cambridge: Cambridge University Press.10.1017/CBO9780511791321Search in Google Scholar

Mollin, S. 2007. New variety or learner English?: Criteria for variety status and the case of Euro-English. English World-Wide 28(2). 167–185.10.1075/eww.28.2.04molSearch in Google Scholar

Monroe, B. L., M. P.Colaresi & K. M.Quinn. 2008. Fightin’ words: Lexical feature selection and evaluation for identifying the content of political conflict. Political Analysis 16(4). 372–403.10.1093/pan/mpn018Search in Google Scholar

Mukherjee, J. 2015. Response to Davies and Fuchs. English World-Wide 36(1). 34–37.10.1075/eww.36.1.02mukSearch in Google Scholar

Nelson, G. 2015. Response to Davies and Fuchs. English World-Wide 36(1). 38–40.10.1075/eww.36.1.02nelSearch in Google Scholar

Nerbonne, J. & P.Kleiweg. 2007. Toward a dialectological yardstick*. Journal of Quantitative Linguistics 14(2–3). 148–166.10.1080/09296170701379260Search in Google Scholar

Platt, J. T. & H.Weber. 1980. English in Singapore and Malaysia: Status, features, functions. Kuala Lumpur: Oxford University Press.Search in Google Scholar

Platt, J. T., H.Weber & M. L.Ho. 1984. The New Englishes. London: Routledge & Kegan Paul.Search in Google Scholar

R Core Team. 2014. R: A language and environment for statistical computing. Vienna, Austria.Search in Google Scholar

Schmied, J.2008a. 12 East African Englishes. B. B. Kachru, Y. Kachru & C. L. Nelson (eds.), The Handbook of World Englishes, 188–202. Oxford: Blackwell Publishers.10.1002/9780470757598.ch12Search in Google Scholar

Schmied, J. 2008b. East African English (Kenya, Uganda, Tanzania): Morphology and syntax. Varieties of English 4. 451–471.10.1515/9783110197181-127Search in Google Scholar

Schneider, E. W. 2003a. The dynamics of New Englishes: From identity construction to dialect birth. Language 79(2). 233–281.10.1353/lan.2003.0136Search in Google Scholar

Schneider, E. W. 2003b. Evolutionary patterns of New Englishes and the special case of Malaysian English. Asian Englishes 6(2). 44–63.10.1080/13488678.2003.10801118Search in Google Scholar

Schneider, E. W. 2007. Postcolonial English: varieties around the world. New York, USA: Cambridge University Press.10.1017/CBO9780511618901Search in Google Scholar

Schneider, E. W.2010. Developmental patterns of English: similar or different? In A. Kirkpatrick (eds.), The Routledge Handbook of World Englishes, 372–384. New York: Routledge.Search in Google Scholar

Séguy, J. 1971. La relation entre la distance spatiale et la distance lexicale: Revue de linguistique romane, 35. 335–357. Palais de l’université.Search in Google Scholar

Seidlhofer, B. 2004. 10. Research perspectives on teaching English as a lingua franca. Annual Review of Applied Linguistics 24. 209–239.10.1017/S0267190504000145Search in Google Scholar

Suzuki, R. & H.Shimodaira. 2006. Pvclust: An r package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22(12). 1540–1542.10.1093/bioinformatics/btl117Search in Google Scholar

Thomason, S. G. & T.Kaufman. 2001. Language contact. Edinburgh: Edinburgh University Press.Search in Google Scholar

Tobler, W. R. 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography 46. 234–240.10.2307/143141Search in Google Scholar

Trudgill, P. 1986. Dialects in contact. Oxford: Blackwell.Search in Google Scholar

Ward Jr, J. H. 1963. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58(301). 236–244.10.1080/01621459.1963.10500845Search in Google Scholar

Wierzbicka, A. 1988. The semantics of grammar, 18. Amsterdam/Philapelphia: John Benjamins Publishing.10.1075/slcs.18Search in Google Scholar

Published Online: 2016-1-12
Published in Print: 2017-5-1

© 2017 Walter de Gruyter GmbH, Berlin/Boston

Downloaded on 28.3.2023 from
Scroll Up Arrow