Lexical innovation is omnipresent and constantly at work. Studies aiming to understand the process of lexical innovation and the subsequent diffusion of neologisms therefore benefit from systematic methods of neologism identification. Retrieval procedures in the past have largely consisted of manual activities of participant observations and close reading. Recently, attempts have been made at designing automatized identification procedures, assisted by state-of-the-art natural language processing techniques and tools. Beginning with a discussion of the most commonly used neologism detection methods and applications in linguistics, the present paper will describe a semi-automatic approach to identifying new words on the web, the NeoCrawler’s Discoverer, which has been developed as part of a project on the incipient diffusion of lexical innovations. The Discoverer daily processes large batches of online text in English and automatically identifies unknown grapheme sequences as potential neologism candidates by means of a dictionary matching procedure, in which the individual tokens are matched against a very large dictionary. These potential neologisms subsequently are presented to the user for manual evaluation of their neologism status. Finally, candidates are added to the NeoCrawler’s database for continuous close monitoring of their development in the online speech community. We argue that the use of dictionary matching in neologism identification offers an efficient method to semi-automatically extract potential instances of lexical innovation with high precision and high recall when compared to previous approaches.
Breen, James. 2010. “Identification of Neologisms in Japanese by Corpus Analysis”. In: Sylviane Granger and Magali Paquot (eds.). eLexicography in the 21st Century: New Challenges, New Applications. Louvain: Presses universitaires de Louvain. 13–22.
Cabré, Maria T. and Lluís de Yzaguirre. 1995. “Stratégie pour la détection semiautomatique des néologismes de presse”. TTR: Traduction, terminologie, rédaction 8: 89–100.
Cartier, Emmanuel. 2016. “Néoveille, système de repérage et de suivi des néologismes en sept langues”. Neologica 10: 101–131.
Cartier, Emmanuel. 2017. “Neoveille, a Web Platform for Neologism Tracking”. Proceedings of the EACL 2017 Software Demonstrations, Valencia, Spain, April 3–7 2017. 95–98. [last accessed 1 March 2018].
Chiru, Costin and Traian Rebedea. 2014. “Archaisms and Neologisms Identification in Texts”. Paper presented at RoEduNet Conference 13th Edition: Networking in Education and Research Joint Event RENAM 8th Conference, Chisinau, Moldova, 11–13 September 2014. [last accessed 1 March 2018].
Diamond, Graeme. 2016. “Making Decisions about Inclusion and Exclusion”. In: Philip Durkin (ed.). The Oxford Handbook of Lexicography. 532–545.
Falk, Ingrid, Delphine Bernhard and Christophe Gérard. 2014. “From Non Word to New Word: Automatically Identifying Neologisms in French Newspapers”. Paper presented at 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland, May 2014. In: Proceedings of the International Conference on Language Resources and Evaluation [last accessed 1 March 2018].
Fischer, Roswitha. 1998. Lexical Change in Present-Day English: A Corpus-Based Study of the Motivation, Institutionalization, and Productivity of Creative Neologisms. Tübingen: Narr.
Geierhos, Michaela. 2006. Grammatik der Menschenbezeichner in biographischen Kontexten. Unpubl. M. A. thesis, CIS, Ludwig-Maximilians-Universität München.
Gérard, Christophe, Lauren Bruneau, Ingrid Falk, Delphine Bernhard and Ann-Lise Rosio. 2017. “Le Logoscope : Observatoire des innovations lexicales en français contemporain”. In: Joaquín García Palacios, Goedele de Sterck, Daniel Linder, Jesús Torre del Rey, Miguel Sánchez Ibanez and Nava Maroto García (eds.). La neología en las lenguas románicas: Recursos, estrategias y nuevas orientaciones. Frankfurt am Main: Lang. 339–356.
Hamilton, William L., Jure Leskovec and Dan Jurafsky. 2016 a. “Cultural Shift or Linguistic Drift? Comparing Two Computational Models of Semantic Change”. In: Proceedings of Conference on Empirical Methods on Natural Language Processing, Austin, Texas, USA, 1–5 November 2016. [last accessed 1 March 2018].
Hamilton, William L., Jure Leskovec and Dan Jurafsky. 2016 b. “Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change”. Proceedings of the 54th Annual Meeting of the ACL, Berlin, Germany, 7–12 August 2016. [last accessed 1 March 2018].
Iakovleva, Tatiana. 2017. “Automatic Detection of Neologisms in Russian Newspaper Corpora with Néoveille”. In: Proceedings of the International Conference CORPUS LINGUISTICS 2017, St. Petersburg, 27–30 June 2017. 43–47. [last accessed 1 March 2018].
Janssen, Maarten. 2005. “NeoTrack: Semiautomatic Neologism Detection”. Paper presented at APL Conference 2005, Lisbon, Portugal. [accessed 1 March 2018].
Kerremans, Daphné. 2015. A Web of New Words: A Corpus-Based Study of the Conventionalization Process of English Neologisms. Frankfurt am Main: Lang.
Kerremans, Daphné, Susanne Stegmayr and Hans-Jörg Schmid. 2012. “The NeoCrawler: Identifying and Retrieving Neologisms from the Internet and Monitoring On-Going Change”. In: Kathryn Allan and Justyna Robinson (eds.). Current Methods in Historical Semantics. Berlin: Mouton de Gruyter. 59–96.
Kerremans, Daphné, Jelena Prokić, Quirin Würschinger and Hans-Jörg Schmid. Forthcoming. “Web Mining in Linguistics: Identifying and Observing Lexical Innovation with the NeoCrawler”.
Kilgarriff, Adam, Jan Busta and Pavel Rychlý. 2015. “DIACRAN: A Framework for Diachronic Analysis”. [accessed 1 March 2018].
Kristiansen, Marita. 2012. “Using Web-Based Corpora to Find Norwegian Specialised Neologies”. Communication & Language at Work 1: 11–20.
Lejeune, Gaël and Emmanuel Cartier. 2017. “Character Based Pattern Mining for Neology Detection”. In: Proceedings of the First Workshop on Subword and Character Level Models in NLP, Copenhagen, Denmark, 7 September 2017. 25–30. [last accessed 1 March 2018].
Levenshtein, Vladimir I. 1966. “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals”. Soviet Physics Doklady 10: 707–710. Originally published as: Левенштейн, Влади́мир И. 1965. “Двоичные коды с исправлением выпадений, вставок и замещений символов”. Доклады Академий Наук СCCP 163: 845–848.
Li, Wei, Kun Guo, Yong Shi, Luyao Zhu and Yuanchun Zheng. 2017. “Improved New Word Tourism Detection Field Method Used in Tourism”. Procedia Computer Science 108C: 1251–1260.
Liu, Tsun-Jui, Shu-Kai Hsieh and Laurent Prevot. 2013. “Observing Features of PPT: A Corpus-Driven Study with N-Gram Model Neologisms”. In: Proceedings of the 25th Conference on Computational Linguistics and Speech Processing, Taiwan, 4–5 October 2013. 250–259.
Mattern, René. 2010. Erkennen von Neologismen: Entwicklung eines Programms zur Untersuchung unbekannten Vokabulars. Unpublished M. A. thesis, CIS, Ludwig-Maximilians-Universität München.
Megerdoomian, Karine and Ali Hadjarian. 2010. “Mining and Classification of Neologisms in Persian Blogs”. In: Proceedings of the Second Workshop on Computational Approaches to Linguistic Creativity, Los Angeles, California, USA, June 5, 2010. 6–13. [last accessed 1 March 2018].
O’Donovan, Ruth and Mary O’Neill. 2008. “A Systematic Approach to the Selection of Neologisms for Inclusion in a Large Monolingual Dictionary”. In: Proceedings of the 13th EURALEX International Congress, Barcelona, Spain, 15–19 July 2008. 571–579. [last accessed 1 March 2018].
OED online = The Oxford English Dictionary. 2000–. 3rd ed. online. Oxford: Oxford University Press. [last accessed 1 March 2018].
Paryzek, Piotr. 2008. “Comparison of Selected Methods for the Retrieval of Neologisms”. Investigationes Linguisticae XVI: 163–181.
Rajaraman, Anand and Jeffrey D. Ullman. 2011. “Data Mining”. In: Jure Leskovec, Anand Rajamaran and Jeffrey D. Ullman (eds.). Mining of Massive Datasets. Cambridge: Cambridge University Press. 1–17.
Renouf, Antoinette and Laurie Bauer. 2000. “Contextual Clues to Word Meaning”. International Journal of Corpus Linguistics 5: 231–258.
Renouf, Antoinette, Andrew Kehoe and Jay Banerjee. 2005. “The WebCorp Search Engine: A Holistic Approach to Web Text Search”. In: Electronic Proceedings of CL2005. [last accessed 21 February 2018].
Schmid, Hans-Jörg. 2016. English Morphology and Word-Formation: An Introduction. 3rd revised and extended ed. Berlin: Schmidt.
Stenetorp, Pontus. 2010. Automated Extraction of Swedish Neologisms Using a Temporally Annotated Corpus. M. A. thesis in Computer Science at the School of Computer Science and Engineering, KTH Royal Institute of Technology. [last accessed 21 February 2018].
Stoyanova, Ivelina, Svetlozara Leseva, Martin Yalamov and Svetla Koeva. no date. “An Online System for Neologism Detection in Bulgarian”. [last accessed 21 February 2018].
Svanlund, Jan. Forthcoming. “Metacomments and Metasignals: What can they Tell us about the Conventionalization of Neologies?”
Torres-del-Rey, Jesús and Nava Maroto. 2014. “Building the Interface between Experts and Linguists in the Detection and Characterisation of Neology in the Field of Neurosciences”. In: Proceedings of the 4th International Workshop on Computational Terminology, Dublin, Ireland, August 2014. 64–67. [last accessed 1 March 2018].
Zwicky, Arnold. 2005. “More Illusions”. Language Log 17 August. [last accessed 19 February 2018].
A renowned journal of English philology, Anglia was founded in 1878 by Moritz Trautmann and Richard P. Wülker. It is thus the oldest journal of English Studies in existence. Anglia publishes essays on the English language and linguistic history, on English literature of the Middle Ages and the modern period, on American literature, on new literatures in English, as well as on general and comparative literary studies.