Jump to ContentJump to Main Navigation
Show Summary Details
More options …

ICAME Journal

1 Issue per year

Open Access
See all formats and pricing
More options …

Guidelines for normalising Early Modern English corpora: Decisions and justifications

Dawn Archer / Merja Kytö / Alistair Baron / Paul Rayson
Published Online: 2015-04-01 | DOI: https://doi.org/10.1515/icame-2015-0001


Corpora of Early Modern English have been collected and released for research for a number of years. With large scale digitisation activities gathering pace in the last decade, much more historical textual data is now available for research on numerous topics including historical linguistics and conceptual history. We summarise previous research which has shown that it is necessary to map historical spelling variants to modern equivalents in order to successfully apply natural language processing and corpus linguistics methods. Manual and semiautomatic methods have been devised to support this normalisation and standardisation process. We argue that it is important to develop a linguistically meaningful rationale to achieve good results from this process. In order to do so, we propose a number of guidelines for normalising corpora and show how these guidelines have been applied in the Corpus of English Dialogues.


  • Archer, Dawn, Anthony M. McEnery, Paul Rayson and Andrew Hardie. 2003. Developing an automated semantic analysis system for Early Modern English. In D. Archer, P. Rayson, A. Wilson and A. M. McEnery (eds.). Proceedings of the Corpus Linguistics Conference 2003, 22-31. Lancaster: University of Lancaster.Google Scholar

  • Baron, Alistair and Paul Rayson. 2008. VARD 2: A tool for dealing with spelling variation in historical corpora. Proceedings of the Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, UK, 22 May 2008. See http://eprints.lancs.ac.uk/41666/1/BaronRaysonAston2008.pdf Google Scholar

  • Baron, Alistair and Paul Rayson. 2009. Automatic standardization of texts containing spelling variation, how much training data do you need? In M. Mahlberg, V. González-Díaz and C. Smith (eds.). Proceedings of the Corpus Linguistics Conference, CL2009, University of Liverpool, UK, 20-23 July 2009. See http://ucrel.lancs.ac.uk/publications/cl2009/314_FullPaper.pdf Google Scholar

  • Baron, Alistair, Paul Rayson and Dawn Archer. 2009. Word frequency and key word statistics in historical corpus linguistics. Anglistik: International Journal of English Studies 20 (1): 41-67.Google Scholar

  • Beal, Joan C. 2002. English pronunciation in the Eighteenth Century: Thomas Spence’s Grand Repository of the English Language. Oxford: Oxford University Press.Google Scholar

  • Beal, Joan C. 2006. Language and region. London and New York: Taylor & Francis.Google Scholar

  • Blake, Norman. 1989. The language of Shakespeare. Houndmills, Basingstoke, Hampshire and London: Macmillan.Google Scholar

  • Blake, Norman. 2002. A grammar of Shakespeare’s language. Houndmills, Basingstoke, Hampshire and London: Palgrave.Google Scholar

  • Brengelman, Fred. H. 1980. Orthoepists, printers and the rationalisation of English spelling. Journal of English and Germanic Philology 79: 332-354.Google Scholar

  • Carney, Edward. 1994. A survey of English spelling. London and New York: Routledge.Google Scholar

  • Cercignani, Fausto. 1981. Shakespeare’s works and Elizabethan pronunciation. Oxford: Clarendon Press.Google Scholar

  • A Corpus of English Dialogues 1560-1760. 2006. Compiled under the supervision of Merja Kytö (Uppsala University) and Jonathan Culpeper (Lancaster University), with the assistance of Dawn Archer and Terry Walker.Google Scholar

  • Dobson, Eric J. 1955. Early Modern Standard English. Transactions of the Philological Society, 25-40.Google Scholar

  • Dobson, Eric J. 1957. English pronunciation 1500-1700. Oxford: Clarendon Press.Google Scholar

  • Elphinston, James. 1765. The principles of the English language digested: or, English grammar reduced to analogy… 2 vols. London. A i.261.Google Scholar

  • Elphinston, James. 1790. Inglish orthoggraphy epitomized … London. EL 288. A vi.544.Google Scholar

  • Evans, Mel. 2012. A sociolinguistics of early modern spelling: An account of Queen Elizabeth I’s correspondence. In J. Tyrkkö, M. Kilpiö, T. Nevalainen and M. Rissanen (eds.). Outposts of historical corpus linguistics: From the Helsinki Corpus to a proliferation of resources (Studies in Variation, Contacts and Change in English 10 [online.]). Available at: <http://www.helsinki. fi/varieng/series/volumes/10/evans/#taavitsainen_2000> [Last accessed 09/12/2014].Google Scholar

  • Görlach, Manfred. 1991. Introduction to Early Modern English. Cambridge: Cambridge University Press.Google Scholar

  • Hiltunen, Turo and Jukka Tyrkkö. 2013. Tagging Early Modern English Medical Texts. Corpus Analysis with Noise in the Signal (CANS) 2013 workshop. Lancaster University. See http://ucrel.lancs.ac.uk/cans2013/ Google Scholar

  • Jones, Charles. 1989. A history of English phonology. London: Longman. Google Scholar

  • Kökeritz, Helge. 1953. Shakespeare’s pronunciation. New Haven: Yale University Press. Google Scholar

  • Lass, Roger. 1999. Introduction. In R. Lass (ed.), The Cambridge history of the English language: Volume III. 1476-1776, 1-12. Cambridge: Cambridge University Press.Google Scholar

  • Lehto, Anu, Alistair Baron, Maura Ratia and Paul Rayson. 2010. Improving the precision of corpus methods: The standardized version of Early Modern English Medical Texts. In I. Taavitsainen and P. Pahta (eds.). Early Modern English Medical Texts: Corpus description and studies, 279-290. Amsterdam: John Benjamins.Google Scholar

  • Nevalainen, Terttu and Helena Raumolin-Brunberg. 2003. Historical sociolinguistics: Language change in Tudor and Stuart England. (Longman Linguistics Library). London: Longman Pearson.Google Scholar

  • Osselton, Noel E. 1963. Formal and informal spelling in the 18th century. Errour, honor and related words. English Studies 44: 267-275.Google Scholar

  • Osselton, Noel E. 1984. Informal spelling systems in Early Modern English: 1500-1800. In N.F. Blake and C. Jones (eds.). English historical linguistics: Studies in development, 123-137. Sheffield: CECTAL.Google Scholar

  • Palander-Collin, Minna and Mikko Hakala. 2011. Standardizing the Corpus of Early English Correspondence (CEEC). A poster given at the 32nd ICAME conference, 1-5 June, 2011. See http://www.helsinki.fi/varieng/CoRD/ corpora/CEEC/standardized.html; for an enlarged version, see http://www.helsinki.fi/varieng/CoRD/corpora/ CEEC/Standardization%20poster%20v2.pdf.Google Scholar

  • Rayson, Paul, Dawn Archer and Nick Smith. 2005. VARD versus WORD: A comparison of the UCREL variant detector and modern spellcheckers on English historical corpora. In Proceedings of Corpus Linguistics 2005, Birmingham University, July 14-17, 2005.Google Scholar

  • Rayson, Paul, Dawn Archer, Alistair Baron and Nicholas Smith. 2007a. Tagging historical corpora - the problem of spelling variation. In Proceedings of Digital Historical Corpora, Dagstuhl-Seminar 06491, International Conference and Research Center for Computer Science, Schloss Dagstuhl, Wadern, Germany, 3rd-8th December 2006. ISSN 1862-4405. http:// www.comp.lancs.ac.uk/~paul/publications/rabs_extAbs_dagstuhl06.pdf Google Scholar

  • Rayson, Paul, Dawn Archer, Alistair Baron, Jonathan Culpeper and Nicholas Smith. 2007b. Tagging the Bard: Evaluating the accuracy of a modern POS tagger on Early Modern English corpora. In Proceedings of the Corpus Linguistics Conference 2007. Birmingham: University of Birmingham. http://comp.eprints.lancs.ac.uk/1528/1/192_Paper.pdf.Google Scholar

  • Rissanen, Matti. 1998. Towards an integrated view of the development of English: Notes on causal linking. In J. Fisiak and M. Krygier (eds.). Advances in English historical linguistics, 389-406. Berlin: Mouton de Gruyter.Google Scholar

  • Rissanen, Matti. 1999. Syntax. In R. Lass (ed.). The Cambridge history of the English language: Volume III. 1476-1776, 187-331. Cambridge: Cambridge University Press. Google Scholar

  • Sairio, Anni. 2009. Language and letters of the Bluestocking Network: Sociolinguistic issues in eighteenth-century epistolary English (Mémoires de la Société Néophilologique de Helsinki 75). Helsinki: Société Néophilologique.Google Scholar

  • Salmon, Vivien. 1999. Orthography and punctuation. In R. Lass (ed.). The Cambridge history of the English language. Volume III. 1476-1776, 13-55. Cambridge: Cambridge University Press.Google Scholar

  • Schneider, Peter. 2002. Computer assisted spelling normalization of 18th century English. In P. Peters, P. Collins and A. Smith (eds.). New frontiers of corpus research: Papers from the 21st International Conference on English Language Research on Computerized Corpora, Sydney, 2000, 199-211. Amsterdam: Rodopi.Google Scholar

  • Scragg, Donald C. 1974. English spelling. Manchester: Manchester University Press.Google Scholar

  • Sebba, Mark. 2007. Spelling and society: The culture and politics of orthography around the world. Cambridge: Cambridge University Press.Google Scholar

  • Smith, Jeremy. 1996. A historical study of English: Form, function and change. London: Routledge.Google Scholar

  • Stenbrenden, Gertrud. 2010. The chronology and regional spread of long-vowel changes in English, c. 1150-1500. PhD dissertation, University of Oslo.Google Scholar

  • Taavitsainen, Irma and Päivi Pahta (eds.). 2010. Early Modern English Medical Texts. Corpus description and studies. Amsterdam/Philadelphia: John Benjamins.Google Scholar

  • Tieken-Boon van Ostade, Ingrid. 1998. Standardization of English spelling: The eighteenth-century printers’ contribution. In J. Fisiak and M. Krygier (eds.). Advances in English historical linguistics, 457-470. Berlin: Mouton de Gruyter. Google Scholar

  • Walker, John 1791. A critical pronouncing dictionary and expositor of the English language. London.Google Scholar

  • Wyld, Henry C. 1923. Studies in English rhymes from Surrey to Pope. London: Murray.Google Scholar

  • Wyld, Henry C. 1927. A short history of English. 3rd edition. London: Murray.Google Scholar

  • Wyld, Henry C. 1936. A history of modern colloquial English. 3rd edition. Oxford: Basil Blackwell.Google Scholar

About the article

Published Online: 2015-04-01

Published in Print: 2015-03-01

Citation Information: ICAME Journal, Volume 39, Issue 1, Pages 5–24, ISSN (Online) 1502-5462, DOI: https://doi.org/10.1515/icame-2015-0001.

Export Citation

© 2015. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License. BY-NC-ND 3.0

Citing Articles

Here you can find all Crossref-listed publications in which this article is cited. If you would like to receive automatic email messages as soon as this article is cited in other publications, simply activate the “Citation Alert” on the top of this page.

Miguel Won, Patricia Murrieta-Flores, and Bruno Martins
Frontiers in Digital Humanities, 2018, Volume 5
Attila Novák, Katalin Gugán, Mónika Varga, and Adrienne Dömötör
Language Resources and Evaluation, 2017
Carolin Odebrecht, Malte Belz, Amir Zeldes, Anke Lüdeling, and Thomas Krause
Language Resources and Evaluation, 2017, Volume 51, Number 3, Page 695

Comments (0)

Please log in or register to comment.
Log in