Jump to ContentJump to Main Navigation
Show Summary Details
More options …



Journal of English Philology

Ed. by Kornexl, Lucia / Lenker, Ursula / Middeke, Martin / Rippl, Gabriele / Zapf, Hubert

4 Issues per year

CiteScore 2017: 0.17

SCImago Journal Rank (SJR) 2017: 0.148
Source Normalized Impact per Paper (SNIP) 2017: 0.672

See all formats and pricing
More options …
Volume 136, Issue 2


Mining the Web for New Words: Semi-Automatic Neologism Identification with the NeoCrawler

Daphné Kerremans / Jelena Prokić
Published Online: 2018-06-13 | DOI: https://doi.org/10.1515/ang-2018-0032


Lexical innovation is omnipresent and constantly at work. Studies aiming to understand the process of lexical innovation and the subsequent diffusion of neologisms therefore benefit from systematic methods of neologism identification. Retrieval procedures in the past have largely consisted of manual activities of participant observations and close reading. Recently, attempts have been made at designing automatized identification procedures, assisted by state-of-the-art natural language processing techniques and tools. Beginning with a discussion of the most commonly used neologism detection methods and applications in linguistics, the present paper will describe a semi-automatic approach to identifying new words on the web, the NeoCrawler’s Discoverer, which has been developed as part of a project on the incipient diffusion of lexical innovations. The Discoverer daily processes large batches of online text in English and automatically identifies unknown grapheme sequences as potential neologism candidates by means of a dictionary matching procedure, in which the individual tokens are matched against a very large dictionary. These potential neologisms subsequently are presented to the user for manual evaluation of their neologism status. Finally, candidates are added to the NeoCrawler’s database for continuous close monitoring of their development in the online speech community. We argue that the use of dictionary matching in neologism identification offers an efficient method to semi-automatically extract potential instances of lexical innovation with high precision and high recall when compared to previous approaches.

Works Cited

  • Breen, James. 2010. “Identification of Neologisms in Japanese by Corpus Analysis”. In: Sylviane Granger and Magali Paquot (eds.). eLexicography in the 21st Century: New Challenges, New Applications. Louvain: Presses universitaires de Louvain. 13–22.Google Scholar

  • Cabré, Maria T. and Lluís de Yzaguirre. 1995. “Stratégie pour la détection semiautomatique des néologismes de presse”. TTR: Traduction, terminologie, rédaction 8: 89–100.Google Scholar

  • Cartier, Emmanuel. 2016. Néoveille, système de repérage et de suivi des néologismes en sept langues”. Neologica 10: 101–131.Google Scholar

  • Cartier, Emmanuel. 2017. “Neoveille, a Web Platform for Neologism Tracking”. Proceedings of the EACL 2017 Software Demonstrations, Valencia, Spain, April 3–7 2017. 95–98. <http://www.aclweb.org/anthology/E17-3024> [last accessed 1 March 2018].Google Scholar

  • Chiru, Costin and Traian Rebedea. 2014. “Archaisms and Neologisms Identification in Texts”. Paper presented at RoEduNet Conference 13th Edition: Networking in Education and Research Joint Event RENAM 8th Conference, Chisinau, Moldova, 11–13 September 2014. <https://www.researchgate.net/publication/277248480_Archaisms_and_Neologisms_IdentificaIden_in_Texts> [last accessed 1 March 2018].Google Scholar

  • Diamond, Graeme. 2016. “Making Decisions about Inclusion and Exclusion”. In: Philip Durkin (ed.). The Oxford Handbook of Lexicography. 532–545. Google Scholar

  • Falk, Ingrid, Delphine Bernhard and Christophe Gérard. 2014. “From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers”. Paper presented at 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland, May 2014. In: Proceedings of the International Conference on Language Resources and Evaluation <https://hal.inria.fr/hal-00959079> [last accessed 1 March 2018].Google Scholar

  • Fischer, Roswitha. 1998. Lexical Change in Present-Day English: A Corpus-Based Study of the Motivation, Institutionalization, and Productivity of Creative Neologisms. Tübingen: Narr.Google Scholar

  • Geierhos, Michaela. 2006. Grammatik der Menschenbezeichner in biographischen Kontexten. Unpubl. M. A. thesis, CIS, Ludwig-Maximilians-Universität München.Google Scholar

  • Gérard, Christophe, Lauren Bruneau, Ingrid Falk, Delphine Bernhard and Ann-Lise Rosio. 2017. “Le Logoscope : Observatoire des innovations lexicales en français contemporain”. In: Joaquín García Palacios, Goedele de Sterck, Daniel Linder, Jesús Torre del Rey, Miguel Sánchez Ibanez and Nava Maroto García (eds.). La neología en las lenguas románicas: Recursos, estrategias y nuevas orientaciones. Frankfurt am Main: Lang. 339–356.Google Scholar

  • Hamilton, William L., Jure Leskovec and Dan Jurafsky. 2016 a. “Cultural Shift or Linguistic Drift? Comparing Two Computational Models of Semantic Change”. In: Proceedings of Conference on Empirical Methods on Natural Language Processing, Austin, Texas, USA, 1–5 November 2016. <http://aclweb.org/anthology/D/D16/D16–1229.pdf> [last accessed 1 March 2018].Google Scholar

  • Hamilton, William L., Jure Leskovec and Dan Jurafsky. 2016 b. “Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change”. Proceedings of the 54th Annual Meeting of the ACL, Berlin, Germany, 7–12 August 2016. <http://aclweb.org/anthology/P/P16/P16–1141.pdf> [last accessed 1 March 2018].Google Scholar

  • Iakovleva, Tatiana. 2017. “Automatic Detection of Neologisms in Russian Newspaper Corpora with Néoveille”. In: Proceedings of the International Conference CORPUS LINGUISTICS 2017, St. Petersburg, 27–30 June 2017. 43–47. <https://hal-univ-diderot.archives-ouvertes.fr/hal-01540995/document> [last accessed 1 March 2018].Google Scholar

  • Janssen, Maarten. 2005. “NeoTrack: Semiautomatic Neologism Detection”. Paper presented at APL Conference 2005, Lisbon, Portugal. <http://maarten.janssenweb.net/index.php?action=publications> [accessed 1 March 2018]. Google Scholar

  • Kerremans, Daphné. 2015. A Web of New Words: A Corpus-Based Study of the Conventionalization Process of English Neologisms. Frankfurt am Main: Lang.Google Scholar

  • Kerremans, Daphné, Susanne Stegmayr and Hans-Jörg Schmid. 2012. “The NeoCrawler: Identifying and Retrieving Neologisms from the Internet and Monitoring On-Going Change”. In: Kathryn Allan and Justyna Robinson (eds.). Current Methods in Historical Semantics. Berlin: Mouton de Gruyter. 59–96.Google Scholar

  • Kerremans, Daphné, Jelena Prokić, Quirin Würschinger and Hans-Jörg Schmid. Forthcoming. “Web Mining in Linguistics: Identifying and Observing Lexical Innovation with the NeoCrawler”. Google Scholar

  • Kilgarriff, Adam, Jan Busta and Pavel Rychlý. 2015. “DIACRAN: A Framework for Diachronic Analysis”. <https://www.sketchengine.co.uk/wp-content/uploads/Diacran_CL2015.pdf> [accessed 1 March 2018].Google Scholar

  • Kristiansen, Marita. 2012. “Using Web-Based Corpora to Find Norwegian Specialised Neologies”. Communication & Language at Work 1: 11–20.Google Scholar

  • Lejeune, Gaël and Emmanuel Cartier. 2017. “Character Based Pattern Mining for Neology Detection”. In: Proceedings of the First Workshop on Subword and Character Level Models in NLP, Copenhagen, Denmark, 7 September 2017. 25–30. <http://www.aclweb.org/anthology/W17-4103> [last accessed 1 March 2018].Google Scholar

  • Levenshtein, Vladimir I. 1966. “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals”. Soviet Physics Doklady 10: 707–710. Originally published as: Левенштейн, Влади́мир И. 1965. “Двоичные коды с исправлением выпадений, вставок и замещений символов”. Доклады Академий Наук СCCP 163: 845–848.Google Scholar

  • Li, Wei, Kun Guo, Yong Shi, Luyao Zhu and Yuanchun Zheng. 2017. “Improved New Word Tourism Detection Field Method Used in Tourism”. Procedia Computer Science 108C: 1251–1260.Google Scholar

  • Liu, Tsun-Jui, Shu-Kai Hsieh and Laurent Prevot. 2013. “Observing Features of PPT: A Corpus-Driven Study with N-Gram Model Neologisms”. In: Proceedings of the 25th Conference on Computational Linguistics and Speech Processing, Taiwan, 4–5 October 2013. 250–259. Google Scholar

  • Mattern, René. 2010. Erkennen von Neologismen: Entwicklung eines Programms zur Untersuchung unbekannten Vokabulars. Unpublished M. A. thesis, CIS, Ludwig-Maximilians-Universität München.Google Scholar

  • Megerdoomian, Karine and Ali Hadjarian. 2010. “Mining and Classification of Neologisms in Persian Blogs”. In: Proceedings of the Second Workshop on Computational Approaches to Linguistic Creativity, Los Angeles, California, USA, June 5, 2010. 6–13. <http://www.aclweb.org/anthology/W10-0300> [last accessed 1 March 2018].Google Scholar

  • Merriam-Webster. Online ed. Springfield, MA: Merriam-Webster. <https://www.merriam-webster.com> [last accessed 1 March 2018].Google Scholar

  • O’Donovan, Ruth and Mary O’Neill. 2008. “A Systematic Approach to the Selection of Neologisms for Inclusion in a Large Monolingual Dictionary”. In: Proceedings of the 13th EURALEX International Congress, Barcelona, Spain, 15–19 July 2008. 571–579. <https://euralex.org/category/publications/euralex-2008> [last accessed 1 March 2018].Google Scholar

  • OED online = The Oxford English Dictionary. 2000–. 3rd ed. online. Oxford: Oxford University Press. <http://www.oed.com> [last accessed 1 March 2018]. Google Scholar

  • Paryzek, Piotr. 2008. “Comparison of Selected Methods for the Retrieval of Neologisms”. Investigationes Linguisticae XVI: 163–181. Google Scholar

  • Rajaraman, Anand and Jeffrey D. Ullman. 2011. “Data Mining”. In: Jure Leskovec, Anand Rajamaran and Jeffrey D. Ullman (eds.). Mining of Massive Datasets. Cambridge: Cambridge University Press. 1–17.Google Scholar

  • Renouf, Antoinette and Laurie Bauer. 2000. “Contextual Clues to Word Meaning”. International Journal of Corpus Linguistics 5: 231–258. Google Scholar

  • Renouf, Antoinette, Andrew Kehoe and Jay Banerjee. 2005. “The WebCorp Search Engine: A Holistic Approach to Web Text Search”. In: Electronic Proceedings of CL2005. <http://www.webcorp.org.uk/publications.html> [last accessed 21 February 2018]. Google Scholar

  • Schmid, Hans-Jörg. 2016. English Morphology and Word-Formation: An Introduction. 3rd revised and extended ed. Berlin: Schmidt.Google Scholar

  • Stenetorp, Pontus. 2010. Automated Extraction of Swedish Neologisms Using a Temporally Annotated Corpus. M. A. thesis in Computer Science at the School of Computer Science and Engineering, KTH Royal Institute of Technology. <https://pontus.stenetorp.se/res/pdf/stenetorp2010automated.pdf> [last accessed 21 February 2018]. Google Scholar

  • Stoyanova, Ivelina, Svetlozara Leseva, Martin Yalamov and Svetla Koeva. no date. “An Online System for Neologism Detection in Bulgarian”. <http://dcl.bas.bg/neologisms/NeologismDetectionInBulgarian.pdf> [last accessed 21 February 2018].Google Scholar

  • Svanlund, Jan. Forthcoming. “Metacomments and Metasignals: What can they Tell us about the Conventionalization of Neologies?”Google Scholar

  • Torres-del-Rey, Jesús and Nava Maroto. 2014. “Building the Interface between Experts and Linguists in the Detection and Characterisation of Neology in the Field of Neurosciences”. In: Proceedings of the 4th International Workshop on Computational Terminology, Dublin, Ireland, August 2014. 64–67. <https://aclanthology.info/papers/W14-4808/w14-4808> [last accessed 1 March 2018].Google Scholar

  • Zwicky, Arnold. 2005. “More Illusions”. Language Log 17 August. <http://itre.cis.upenn.edu/myl/languagelog/archives/002407.html> [last accessed 19 February 2018]. Google Scholar

About the article

Published Online: 2018-06-13

Published in Print: 2018-06-11

Citation Information: Anglia, Volume 136, Issue 2, Pages 239–268, ISSN (Online) 1865-8938, ISSN (Print) 0340-5222, DOI: https://doi.org/10.1515/ang-2018-0032.

Export Citation

© 2018 Walter de Gruyter GmbH, Berlin/Boston.Get Permission

Comments (0)

Please log in or register to comment.
Log in