Introducing Idioms in the Galician WordNet: Methods, Problems and Results

María Álvarez de la Granja 1 , Xosé María Gómez Clemente 2  and Xavier Gómez Guinovart 3
  • 1 Instituto da Lingua Galega, Universidade de Santiago de Compostela, Santiago de Compostela, 15782, Galicia, Spain
  • 2 Departamento de Filoloxía Galega e Latina, Universidade de Vigo, Vigo, 36200, Galicia, Spain
  • 3 Departamento de Tradución e Lingüística, Universidade de Vigo, Vigo, 36200, Galicia, Spain

Abstract

This study describes the introduction of verbal idioms in the Galician language version (Galnet) of the semantic network WordNet; a network that does not traditionally include many phraseological units. To enhance Galnet, a list of 803 Galician verbal idioms was developed to then review each of them individually and assess whether they could be introduced in an existing WordNet synset (a group of synonyms expressing the same concept) or not. Of those 803 idioms, 490 (61%) could be included in this network. Besides, Galnet was enlarged with 750 extra verbal idioms, most of them synonyms or variants of the former. In this study, we present the working methodology for the experiment and an analysis of the results, to help understand the most important problems found when trying to introduce idioms in Galnet. We also discuss the reasons preventing the inclusion of some expressions, and the criteria used to introduce the idioms that finally made it into the network.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Álvarez de la Granja, María. 2003. Apéndice [Appendix]. In Álvarez de la Granja, María, As locucións verbais galegas [Galician Verbal Idioms]. Santiago de Compostela: Universidade, pp. 227-256.

  • Castro Macía, Luís. 2003. Diccionario Xerais de sinónimos, termos afíns e contrarios. [Dictionary Xerais of Synonyms, Related Words and Opposites]. Vigo: Xerais.

  • CORGA: López Martínez, Marisol. Corpus de Referencia do Galego Actual [1.7]. [Reference Corpus of Contemporary Galician]. Santiago de Compostela: Centro Ramón Piñeiro para a Investigación en Humanidades. http://corpus.cirp.es/corga (15.02.2016).

  • Feixó Cid, X. (coord.). 2007. Dicionario fraseolóxico castelán-galego e de correspondencias galego-castelán Século 21 [Phraseological Dictionary Spanish-Galician and of Correspondences Galician-Spanish Século 21]. Vigo: Cumio / Xerais. Gómez Clemente, Xosé María, Xavier Gómez Guinovart, Alberto Simões. 2014. Dicionario de sinónimos do galego [Galician Dictionary of Synonyms]. Vigo: Universidade de Vigo. http://sli.uvigo.es/sinonimos/index.php (30.11.2015).

  • López Taboada, Carme, Mª Rosario Soto Arias. 2008. Dicionario de fraseoloxía galega [Dictionary of Galician Idioms]. Vigo: Xerais.

  • Noia Campos, María Camiño, Xosé María Gómez Clemente, Pedro Benavente Jareño (coords.). 1997. Diccionario de sinónimos da lingua galega [Dictionary of Synonyms of the Galician Language]. Vigo: Galaxia.

  • Pena, Xosé Antonio. 2001. Diccionario Cumio de expresións e frases feitas: castelán-galego [Dictionary Cumio of Expressions and Idioms: Spanish-Galician]. Vigo: Edicións do Cumio.

  • RILG: Gómez Guinovart, Xavier (coord.). 2006-2016. Recursos integrados da lingua galega [Integrated Language Resources for Galician]. Vigo: Universidade de Vigo, Santiago de Compostela: Instituto da Lingua Galega http://sli.uvigo.es/RILG/ (15.02.2016).

  • TILG: Santamarina, Antón (coord.). Tesouro informatizado da lingua galega [Computerised Thesaurus of the Galician Language]. Santiago de Compostela: Instituto da Lingua Galega. http://ilg.usc.es/TILG/ (15.02.2016).

  • Álvarez de la Granja, María. 2003. As locucións verbais galegas [Galician Verbal Idioms]. Santiago de Compostela: Universidade.

  • Baránov, Anatolij, Dmitrij Dobrovol’skij. 2009. Aspectos teóricos da fraseoloxía [Theoretical Aspects of Phraseology]. Santiago de Compostela: Xunta de Galicia.

  • Calzolari, Nicoletta, Charles J. Fillmore, Ralph Grishman et al., Towards Best Practice for Multiword Expressions in Computational Lexicons. In: Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02), 29-30 May 2002, Las Palmas de Gran Canaria, Spain. Las Palmas de Gran Canaria: Universidad de Las Palmas de Gran Canaria 2002, pp. 1934-1940. http://www.lrec-conf.org/proceedings/lrec2002/ (15.02.2016).

  • Corpas Pastor, Gloria. 1996. Manual de fraseología española. Madrid: Gredos.

  • Corpas Pastor, Gloria. 2000. Acerca de la (in)traducibilidad de la fraseología. In Corpas Pastor, Gloria (ed.), Las lenguas de Europa: estudios de fraseología, fraseografía y traducción. Granada: Comares, pp. 483-522.

  • Corpas Pastor, Gloria. 2003. La traducción de la fraseología: técnicas y estrategias. In Corpas Pastor, Gloria, Diez años de investigación en fraseología: análisis sintáctico-semánticos, contrastivos y traductológicos. Madrid: Iberoamericana, pp. 213-223.

  • Fellbaum, Christiane. 1998a. Introduction. In Fellbaum, Christiane (ed.), WordNet, an electronical lexical database. Cambridge / London: MIT, pp. 1-19.

  • Fellbaum, Christiane, Towards a Representation of Idioms in Wordnet. In: Harabagiu, Sanda (ed.), Proceedings of the Workshop on Usage of WordNet in Natural Language Processing Systems at the 17th International Conference on Computational Linguistics & the 36th Annual Meeting of the Association for Computational Liguistics (COLING-ACL’98), 10-14 August 1998, Montréal, Canada. Montreal: University of Montreal 1998b, pp. 52-57.

  • Gómez Clemente, Xosé María, Xavier Gómez Guinovart, Andrea González Pereira et al. 2013. Sinonimia e rexistros na construción do WordNet do galego [Synonymy and Registers in the Construction of the Galician WordNet]. Estudos de Lingüística Galega 5, pp. 27-42. http://www.usc.es/revistas/index.php/elg/article/view/1342/1178 (15.02.2016).

  • González Agirre, Aitor, Egoitz Laparra, German Rigau, Multilingual Central Repository Version 3.0: Upgrading a Very Large Lexical Knowledge Base. In: Fellbaum, Christiane, Piek Vossen (eds.), Proceedings of the 6th International Global WordNet Conference, 9-13 January 2012, Matsue, Japan. Brno: Tribun EU 2012, 8 pp.

  • Martí Antonín, M. Antònia. 2001. Consideraciones sobre la polisemia. In Martí Antonín, M. Antònia, Ana Fernández Montraveta, Glòria Vázquez García (eds.), Lexicografía computacional y semántica. Barcelona: Edicions Universitat de Barcelona, pp. 61-104.

  • Miller, George A. 1998. Nouns in WordNet. In Fellbaum, Christiane (ed.), WordNet, an electronical lexical database. Cambridge / London: MIT, pp. 23-46.

  • Miller, George A., Richard Beckwith, Christiane Fellbaum et al. 1990. Introduction to WordNet: An On-line Lexical Database. International Journal of Lexicography 3 (4), pp. 235-244. http://dx.doi.org/10.1093/ijl/3.4.235 (15.02.2016).

  • Osherson, Anne, Christiane Fellbaum, The representation of idioms in WordNet. In: Proceedings Global WordNet Conference 2010, 31 January - 4 February 2010, Mumbai, India. Narosa 2010: New Delhi, 5 pp. http://www.cfilt.iitb.ac.in/gwc2010/ pdfs/16_Representation_of_Idioms_in_WordNet__Osherson.pdf (15.02.2016).

  • Palacios Martínez, Ignacio M. 1999. Negative polarity idioms in Modern English. ICAME Journal 23, pp. 65-115. Simões, Alberto, Xavier Gómez Guinovart. 2014. Bootstrapping a Portuguese WordNet from Galician, Spanish and English wordnets. In Navarro Mesa, Juan Luis, Alfonso Ortega, António Teixeira et al. (eds.), Advances in Speech and Language Technologies for Iberian Languages. Berlin: Springer, pp. 239-248.

  • Solla Portela, Miguel Anxo, Xavier Gómez Guinovart. 2015. Galnet: o WordNet do galego. Aplicacións lexicolóxicas e terminolóxicas [Galnet: the Galician WordNet. Applications in the fields of lexicology and terminology]. Revista Galega de Filoloxía 16, pp. 169-201. http://dx.doi.org/10.17979/rgf.2015.16.0.1383 (15.02.2016).

  • Vincze, Veronika, Attila Almási, János Csirik. 2012. Multiword Verbs in Wordnets. In: Fellbaum, Christiane, Piek Vossen (eds.), Proceedings of the 6th International Global WordNet Conference, 9-13 January 2012, Matsue, Japan. Brno: Tribun EU 2012, pp. 377–381.

  • Vossen, Piek (ed.). 2002a. EuroWordNet General Document (LE2-4003, L4-8328). Final document, version 3. http://www.illc. uva.nl/EuroWordNet/docs.html (15.02.2016).

  • Vossen, Piek. 2002b. WordNet, EuroWordNet and Global WordNet. Revue française de linguistique appliquée 7, pp. 27-38.

OPEN ACCESS

Journal + Issues

Open Linguistics is a new academic peer-reviewed journal covering all areas of linguistics. The objective of this journal is to foster free exchange of ideas and provide an appropriate platform for presenting, discussing and disseminating new concepts, current trends, theoretical developments and research findings related to a broad spectrum of topics: descriptive linguistics, theoretical linguistics and applied linguistics.

Search