Introducing Idioms in the Galician WordNet: Methods, Problems and Results

Abstract This study describes the introduction of verbal idioms in the Galician language version (Galnet) of the semantic network WordNet; a network that does not traditionally include many phraseological units. To enhance Galnet, a list of 803 Galician verbal idioms was developed to then review each of them individually and assess whether they could be introduced in an existing WordNet synset (a group of synonyms expressing the same concept) or not. Of those 803 idioms, 490 (61%) could be included in this network. Besides, Galnet was enlarged with 750 extra verbal idioms, most of them synonyms or variants of the former. In this study, we present the working methodology for the experiment and an analysis of the results, to help understand the most important problems found when trying to introduce idioms in Galnet. We also discuss the reasons preventing the inclusion of some expressions, and the criteria used to introduce the idioms that finally made it into the network.


WordNet / Galnet
This study summarises the development of the Galnet project for phraseology (Fralnet subproject).The objective of Galnet is to build a Galician language WordNet and it is the research group at the University of Vigo TALG (Galician Language Technologies and Applications) who is in charge of this project.The Galnet project is still in progress, but has already yielded interesting and useful results in the fields of Galician lexicology, semantics and automatic language processing (Solla Portela, Gómez Guinovart 2015).
WordNet is a lexical database for the English language, organized as a semantic network where the nodes are concepts represented as sets of synonyms, and the links between nodes are semantic relations between lexical concepts (Miller et al. 1990).The nodes contain nouns, verbs, adjectives and adverbs grouped by synonymy.In WordNet terminology, a set of synonyms is called a synset, and each lemmatised synonym in a synset is considered a lexical variant of the same concept.Thus, each synset represents a distinct lexicalised concept and includes all the synonymous variants of this concept.Additionally, each synset may contain a brief definition (or gloss), which is common to every variant in the synset, and, in some cases, one or more examples of the use of the variants in context.
In the WordNet model of lexical representation, synsets are linked by means of lexical-semantic relations.Some of the most frequent relations represented in WordNet are hypernymy/hyponymy and holonymy/meronymy for nouns; antonymy and quasi-synonymy for adjectives; antonymy and derivation for adverbs; and entailment, hypernymy/hyponymy, cause and opposition for verbs.
WordNet, which was originally developed for English, is now available in many other languages, although the English WordNet still stands as the most complete reference version.Created and maintained at Princeton University since 1985, version 3.0 contains 206,941 lemmas, i.e. synonymous variants (155,287 of which are unique, non-homographic forms) grouped into 117,659 sets of synonyms or synsets.
Many of the WordNet versions in languages other than English follow the design model of EuroWordNet (Vossen 2002b), where the synsets of a particular language are linked to the synsets of the other languages through an InterLingual Index (ILI) that is unique to each concept, and which is mainly based on the synsets of the English WordNet.Therefore, the set of WordNet lexicons in different languages allows the connection between the synsets of any pair of languages via the ILI, thus constituting a very useful resource in applications of linguistic technologies dealing with multilingual processing.
The goal of the Galnet project is to build a WordNet for the Galician language, aligned with the ILI generated by the English WordNet 3.0.This project is part of a wider endeavour for the coordinated integration of the Spanish, Catalan, Galician and Basque versions of WordNet 3.0 (González Agirre et al. 2012).The Galnet project is also aligned with the development of the Portuguese WordNet carried out within the PULO (Portuguese Unified Lexical Ontology) project at the Centro de Estudos Humanísticos da Universidade do Minho (Simões, Gómez Guinovart 2014).
Table 1 shows the current development status of Galnet (version 3.0.22),which can be searched via the research group web interface (http://sli.uvigo.gal/galnet/).It should be noted that the official distribution of Galnet, while being extremely important for the dissemination and use of the resource, is just a "frozen" version of the database, and the most up-to-date data can only be accessed directly from Galnet's web interface.
[lexical units consisting of two graphic words on its lower limit and a compound sentence in its upper limit.These units are characterised by high frequency in use, the co-occurrence of their components; institutionalisation -understood in the sense of fixedness and semantic specialisation-, idiomaticity and potential variation, as well as for the degree in which all these aspects occur in different instances].
In our project, we have worked exclusively with verbal idioms, that is, phraseological units that do not have textual independence and that perform the functions of verbs (Corpas Pastor 1996: 102).
As we have seen in Section 1.1, WordNet is a database structured around concepts linked to synonym series called synsets.These series are primarily monolexical and few phraseological units are included, despite the fact that they are lexical units too (even if they have special semantic and structural features).
A few years ago, Christiane Fellbaum, one of the researchers who designed the initial WordNet project at Princeton, particularly referred to some of the difficulties of introducing idioms in the system: They express concepts that cannot be fitted into WordNet's web structure either as members of existing synsets or as independent concepts, because there are no other lexicalized concepts to which they can be linked via any of the WordNet relations.In fact, if one examines idioms and their glosses in an idiom dictionary, one quickly realizes that almost all idioms express complex concepts that cannot be paraphrased by means of any of the standard lexical or syntactic categories.
Consider such examples as fish or cut bait, cook one's/somebody's goose, and drown one's sorrows/troubles.These idioms carry a lot of highly specific semantic information that would probably get lost if they were integrated into WordNet and attached to more general concepts (Fellbaum 1998b: 55).
Fellbaum's concerns are in fact justified (see Section 4.1.1),but it seems to have been taken to the extreme, as WordNet uses a concept of contextual synonymy that allows for the construction of synsets without total meaning equivalence, as one can verify just by having a superficial look at many WordNet synsets: The notion of synonymy used in Wordnet does not entail interchangeability in all contexts; by that criterion, natural languages have few synonyms.The more modest claim is that Wordnet synonyms can be interchanged in some contexts.To be careful, therefore, one should speak of synonymy relative to a context (Miller 1998: 24).
On the web, the concept is the basic unit from which semantic relations are established, and this concept is expressed by a set of synonyms that have a differential semantic trait each.In this sense, phraseological units or idioms, even if they express complex concepts, can share semantic traits with the other components in a synset, and the loss of nuances would be similar to that happening in monolexical units (see Section 4.2).
In a later study, Fellbaum herself claimed that this absence of idioms was a gap that had to be bridged "if WordNet is to be successfully used in applications requiring Word Sense Disambiguation and language pedagogy" (Osherson, Fellbaum 2010).
Aware of the quantitative and qualitative relevance of idioms in the lexicon of any language, we undertook the so-called Fralnet experiment to determine to what extent such units could fit the current WordNet synsets and also to enhance Galnet with idioms as much as possible.Due to theoretical and practical reasons, we decided to restrict our experiment to a subcategory of idioms, verbal idioms.After the experiment, Galnet was expanded with 1235 verbal idioms and we proved that a large number of these units could fit into existing WordNet synsets.The following sections introduce the methodology, results and a discussion of some of the most relevant questions in the process of introducing verbal idioms in this semantic network.

Corpus and Methodology
As we have just discussed, the objective of Fralnet is twofold: on the one hand, there is a practical goal to expand Galnet with idioms; on the other, the theoretical purpose of understanding to what extent they fit this semantic network and what problems they may pose.This second objective determined our working method, i.e., to establish a corpus of expressions and look for the appropriate synset amongst those already existing in WordNet.We therefore moved from the phraseological units to the synsets.The opposite path, i.e., to select synsets and then enrich them freely with appropriate idioms, would not allow us to pursue our theoretical objective.
As mentioned in Section 1.2, we restricted our analysis to a subtype of idioms, verbal idioms.The reasons for this are both theoretical and practical.On the one hand, we decided to work with a set of units with a relatively homogeneous behaviour, thus avoiding introducing a high degree of heterogeneity in the corpus that could hinder proper data analysis.On the other, we had a corpus of verbal idioms that was already available -this helped select the idioms.Considering, likewise, that one of the least frequent categories in Galnet, in relative terms, is the verb (see Table 1), choosing these idioms was indeed appropriate, as it allowed us to increase the number of verb units in the Galician language network.
The corpus we used contained 850 verbal idioms from spoken and written Galician texts of 61 sources included as an appendix in the work Locucións verbais galegas [Galician Verbal Idioms] (Álvarez de la Granja 2003).We screened this initial list and removed some 6% of those expressions (49 in total).The expressions were removed for being archaic or infrequent, because it was not clear to us whether they were verbal idioms and, above all, if we found them to be Spanish interferences or castilianisms.In this regard, one has to consider that Galician is a language with a high degree of transfer from Spanish at all levels: lexical, syntactic, phonic... and also in terms of idioms.As the official trend is to eliminate those transfers or castilianisms from the Galician standard (a trend also followed by Galnet), those expressions were not included in the corpus.
In some cases, we introduced some minor modifications in the lemma, so that the most frequently used or more appropriate expression was included (for example, ceibar o derradeiro folguexo was replaced by ceibar o derradeiro folgo or non dar un paso was turned into an affirmative statement as dar un paso).
We also divided into two the entries for some idioms that were linked to two meanings in the corpus, because for us the two meanings were too far apart: facer as beiras as 'praising' and as 'courting' and ver a luz as 'be published' and 'be born'.All in all, the final corpus included 803 entries of Galician verbal idioms that we tried to introduce in Galnet.
With this goal in mind, we analysed the meaning of each of the expressions, looked for their English equivalent, with the support of translation and lexicographical resources, and we verified whether those equivalents had an entry in WordNet as a variant of any synset.Whenever we did not obtain positive results with this procedure, we reviewed all nodes that were semantically close to the concept, in order to look for the appropriate synset.Once we determined the idioms in the corpus that could fit WordNet, we introduced them in Galnet.Likewise, we introduced variants or synonyms in the appropriate synset (idioms or otherwise, as you can see in Section 3).To this end, we used different lexicographical resources, such as dictionaries of idioms (Pena 2001, Feixó Cid 2007, López Taboada, Soto Arias 2008) and dictionaries of synonyms (Noia Campos et al. 1997, Castro Macía 2003, Gómez Clemente et al. 2014).The set of expressions introduced into Galnet was tested via a web editing interface created specifically for this purpose.

Results
For 490 (61%) verbal idioms we found one (or several) appropriate synsets.These idioms, including three (gañar terreo, non dar un chío and perder de vista) already introduced in Galnet when we started the project, are shown in Appendix I.The remaining 313 idioms (38%) that could not be introduced in WordNet's concept list are included in Appendix II.
Some of the entries in our definitive corpus were divided into two to be included in several synsets of related meaning (for example, dar pé was included in the synset whose gloss is 'cause to do; cause to act in a specified manner', as well as in the synset defined as 'cause to happen, occur or exist').Therefore, the total number of variants introduced in Fralnet from the idioms in our corpus was 512 (excluding this time the three that were already in Galnet).
Apart from the verbal idioms in the corpus, we introduced some further 750 that were not in the initial corpus, while most of them were synonyms or variants of the former.For example, in the same synset in which we introduced the expression ser uña e carne, glossed in WordNet as 'to have smooth relations', we Some of these expressions were included in several synsets, just like the idioms in the corpus.In this way, the total number of variants for these 750 expressions is 769.
In consequence, Fralnet included a total of 1237 different verbal idioms, which amounts to 1281 variants of idiomatic character.
Besides, in order to complete the synsets we not only introduced idioms but monolexical units in some of them too (for example, conxeniar or leirar in the synset for ser uña e carne), and some frequent combinations that are not strictly phraseological (for example, in the same synset, levarse ben "get on well").In total, 832 variants which, together with the verbal idioms mentioned above, add up to 2113 variants corresponding to 425 different synsets.These expressions may be found at http://sli.uvigo.es/galnet/selecting the experiment Fralnet in the last search box ("Procurar variantes galegas por experimento" [Search for Galician variants per experiment]) and leaving the first box blank ("Procurar variante" [Search variant]).

Introducing Idioms in Galnet
This section analyses the most important problems we found when introducing idioms from our corpus into Galnet, as well as the most relevant decisions we had to make in this regard.Section 4.1 analyses the idioms that could not be introduced in Galnet, while Section 4.2 sheds light on the criteria we used to interpret synonymy and therefore the type of idioms that were included in the same synset.Finally, Section 4.3 delves into three specific questions that have to do with the way idioms are presented in Galnet: semantic analysability, combinatorial information and the distribution of some idioms into several synsets.

Expressions Not Included in Galnet
This section illustrates the two basic reasons that prevented the introduction of idioms from our corpus into Galnet: lack of a lexicalised concept in the English language and concept missing from WordNet.Below we underline some of the specific cases of idioms that could not be introduced or that proved problematic.

Non Lexicalised Concepts in English
Some of the Galician idioms expressed content for which there is no lexicalised equivalent expression in English.Therefore, there was no concept in WordNet and the expressions in our corpus could not be introduced in Galnet.For example, in Galician we have a verbal idiom that means 'to leave or interrupt [somebody] without listening to what they were about to say': deixar coa palabra na boca lit."leaving somebody with the word in their mouth".Obviously, this content can be expressed in English through free syntactic expressions, but these are not frequent in WordNet.While the database can introduce non lexicalised concepts needed to fill lexical gaps when creating WordNet's relational structure (Fellbaum 1998a: 5-6), the fact is that in no case did this option open the door for us to include the Galician idioms in our corpus.Other similar examples are devolver a pelota "give the ball back" ('to respond to what [another person] said or did using similar words or actions'), meterse a redentor "play the Redeemer" ('to solve problems, especially those of others'), oír campás e non saber onde as tocan "to hear bells toll but not knowing where" ('to have access to some information but to misinterpret it or interpret it in a partial way'), etc.This list must of course be read with some caution, as we cannot claim for sure that there is no lexicalised option in English at all, but in any case our analysis revealed that 136 expressions excluded (around 43% of them) fall into this category.An important remark, however, is that in all cases the gap found is pragmatic and not cultural; that is, the concepts are familiar but they are not expressed in English through a single lexical unit: A cultural gap is a concept not known in the English/American culture, e.g. the Dutch noun citroenjenever, which is a kind of gin made out of lemon skin, or the Dutch verb: klunen (to walk on skates over land from one frozen water to another).Pragmatic gaps are caused by lexicalization differences between languages, in the sense that in this case the concept is known but not expressed by a single lexicalized form in English (Vossen 2002a: 39).
The quantitative relevance of this set of expressions is foreseeable, in the sense that idioms are regularly used to express complex, nuanced content, with a high degree of semantic specialisation and with peculiar discursive behaviours that are not always found in other languages.For example, both in Galician and in English there are words to express relatively "simple" content such as 'respond' or 'solve', but when these concepts are enriched, when they become more complex and nuanced, the search for lexicalised correlations proves more difficult.Indeed, as we have just shown, fixed content in Galician such as 'to respond to what [another person] said or did using similar words or actions' or 'to solve problems, especially those of others' are not lexicalised in English.

Lexicalised Concepts in English but Missing from WordNet
For almost 57% of the Galician idioms excluded, we found an English expression of identical or very close meaning, however missing from WordNet.Approximately one in four had an equivalent in English that matched the figurative meaning and that was very close or identical to their literal sense.Nonetheless, we cannot speak about full equivalence, because this would mean matching frequency and distribution, or coincidences at pragmatic level (Corpas Pastor 2003: 217); all these aspects were not studied for the idioms at hand.Here you have some examples of full match, mover ceo e terra (move heaven and earth), non mover un dedo (not move a finger), pechar filas (close ranks), matar o mensaxeiro (kill the messenger) or close match, poñer a outra meixela "to offer the other cheek" (turn the other cheek).In other cases, the images of the Galician and English expressions are quite different, with varying degrees of proximity, but this would not be a problem for introducing Galician idioms if English equivalents were in WordNet, as the denotative value of both is the same or very similar: perder o tren "to miss the train" (miss the boat, miss the bus), pagar o pato "to pay for the duck" (take the rap, pay the piper, carry the can), pasar factura "to send the bill" (take its toll), ver con outros ollos "to see with different eyes" (see in a new light), quedar en auga de castañas "to remain chestnut water" (come to nothing), etc.
Whenever the English language expression is missing from WordNet, the concept is also missing and it is therefore impossible to introduce the said idioms in Galnet.This is not surprising in light of our previous comments highlighting how complex expressions carry complex, nuanced content.Usually, this type of content is carried through multi-verbal expressions alone and, as we mentioned in Section 1.2, such expressions are not particularly frequent in WordNet ("Currently, only few idioms are included in WordNet", Osherson, Fellbaum 2010: 3).

Verbal idioms with Stative Value
Some of the verbal idioms in our corpus express stative values, and can therefore be replaced by verbs with little semantic load such as ser "to be", estar "to be" or quedar "stay", followed by an element of attributive character.
In some of these cases, we did not find verbal idioms in English (neither verbs) with the same meaning (for example, the Galician expression non levantar un palmo "not to raise a palm above the ground", 'to be very short', 'to be very young').WordNet may offer an entry for a nominal or adjectival element (short, young, knee-high to a grasshoper), but finding the combination of the verb "to be" with a noun / noun phrase or an adjective / adjective phrase (to be short, to be young, to be knee-high to a grasshopper) proves more difficult because such combinations rarely become lexicalised units.Obviously, Galician verbal idioms cannot be linked to these nominal or adjectival synsets in English, as the grammatical categories do not match.Other similar examples include caer de seu peso "to fall due to its own weight", 'to be very obvious or evident'; quedar para vestir santos "to remain to dress saints", 'to stay single [a woman]' or comer as papas na cabeza "eating porridge on somebody's head", 'to be taller [than somebody else]'.
In other cases, there is a lexicalised expression in English with an equivalent meaning (for example, have a good head on one's shoulders, equivalent to ter a cabeza no seu sitio "to have your head in the right place", 'to be sensible').However, at least in the first WordNet versions, these expressions did not stand many options of entering the system, according to Fellbaum: Another type of VP idiom that does not readily fit into WordNet is that whose meaning can be glossed as be or become Adj.These idioms have the form of a VP but express states: hide one's light under a bushel and hold one's tongue mean "be modest" and "be quiet," respectively; flip one's wig; blow one's stack/a fuse, and hit the roof / ceiling all mean "become angry," and get the axe means be fired / dismissed.Similarly, the phrase one's heart goes out (to) can be glossed by means of the verb feel and the adjective phrase "sorry or sympathetic (for)."Such idioms pose a problem for integration into WordNet, not because of their form but because of the kinds of concepts they express.In WordNet, verbs (including copular verbs) and adjectives are strictly separated because they express distinct kinds of concepts.This separation is of course desirable and even necessary when one deals with non-idiomatic language, where the meaning of a phrase or sentence is composed of the meanings of its individual parts.Copular or copula-like verbs like be and feel combine with a large number of adjectives and there is no point in entering specific combinations into a lexicon (Fellbaum 1998b: 56).
Although Fellbaum (1998b: 56) highlights that these expressions of stative content can be integrated as subordinates of stative verbs (for example, to be), it is true that only few of the idioms she uses to exemplify this point in the quotation above are present in the 3.0 version of WordNet: only those expressing the content "to become angry", all of them under the same synset (ili-30-01795428-v in EuroWordNet), are included.

Idioms with Different Causative / Resultative Values
In some cases, we found variants whose content was close to the Galician forms in our corpus in English language synsets, but differences hindered their connection through the same ILI.These are forms that showed an opposition between causative and anticausative or resultative values.
The cause relation picks out two verb concepts, one causative (like give), the other what might be called 'resultative' (like have).English has lexicalized causative pairs like show-see and fell-fall, which are linked in Wordnet by the appropriate pointer.In addition, WordNet contains cause pointers from causative, transitive verbs to the corresponding anticausative (inchoative), intransitive sense of the same word (Fellbaum 1998a: 83).
This opposition is included in EuroWordNet through the relation CAUSES and IS_CAUSED_BY (see Vossen 2002a) in such a way that, in principle, verbs showing such opposition, be it lexical or merely grammatical, should belong to different synsets, linked through the indicated relations.
Although in practice we can find some exceptions to this procedure in WordNet (for example, in melt both values are under the same synset, whose gloss is 'reduce or cause to be reduced from a solid to a liquid state, usually by heating'; similarly, blacken is glossed as 'make or become black'), it does not seem advisable to group forms with clearly differentiated values: causative verbs account for an action that leads to a change and resultative ones for a process experienced by the element suffering the change.These values are also clearly defined by specific relations in WordNet.In consequence, Galician idioms had to be excluded from Galnet if WordNet only included one of the above-mentioned values (causative or anticausative) and the expressions in our corpus had a different value.For example, we could not include idioms such as perder o xuízo "lose judgement" and synonyms, with resultative value ('turn crazy'), as this concept is not present in WordNet, although it may be expressed in English through idioms such as lose the mind or go out of mind, or, of course, through free combinations such as go crazy / insane.According to all this, the synset with the variants crazy and madden ('cause to go crazy; cause to lose one's mind') is not appropriate as it has a clear causative value.

Negative Idioms
Our corpus of Galician idioms included 67 that are used in a negative construction and therefore, dictionaries tend to lemmatise them through the particle non "not": non dar un chío, non deixar respirar, non haber volta de folla, etc.Of these 67 expressions, we managed to introduced only 14 in WordNet synsets using two pathways: In most cases, we kept the negative element in the presentation of the Galician variant and we included the expression in combination with other variants that are freely integrated in negative or affirmative sentences: non probar bocado "not to eat a bite" ('abstain from eating', other variants: estar en xaxún, xaxuar / fast), non poder ver "to be unable to see" ('dislike intensely; feel antipathy or aversion towards', other variants: detestar, odiar, etc. / detest, hate).As Palacios Martínez (1999: 67) claims regarding the affirmative content of some English negative idioms: "there is sometimes no correspondence between the syntactic structure of the idiomatic construction and the meaning expressed by it".The same thing can be said for Galician.
However, the Galician idiom non dar un peso "not to give a penny" was included without the negative particle (dar un peso) in the synset whose gloss is 'show no concern or interest; always used in the negative', and whose English variants are care a hang, give a damn, give a hang e give a hoot.As we can see, the restriction in the use of the expression is made explicit in the definition.These glosses have advantages, as the negative particle non can disappear and be replaced by other words of negative character such as ninguén "nobody", nunca "never", etc., as you can see in example (1).In this way, introducing the idiom without the non may be more suitable from the point of view of the automatic retrieval of the expression: Nonetheless, this procedure also shows some disadvantages, as it does not allow for a combination of expressions that are only used in the negative with others lacking restrictions in polarity in a single synset.
After a search of the sequence <negativ> in all WordNet glosses, and after filtering those with this sequence restricting the use of the variants, we could only find eight WordNet synsets (of which five are verbal synsets: ili-30- 01107895-v, ili-30-01823149-v, ili-30-02507278-v, ili-30-02526509-v, ili-30-02726164-v) including a gloss of the type mentioned, indicating either that polarity has to be negative, or else, that the negative use is more frequent.The scarcity of such glosses, combined with the fact that no English verbal idiom including not is found in WordNet, and only two using no (leave no stone unturned and make no bones about), may explain the reduced percentage of Galician negative idioms included in Galnet: Some idiom strings have surface forms that do not conform to any of the syntactic categories included in WordNet.For example, many idioms must occur with a negation: the VP give a hoot loses its (figurative) meaning in the absence of negation; the same is true for the VP hold a candle to.The negation must therefore be considered part of the idioms.But a verb phrase headed by negation is not a constituent recognized in WordNet (Fellbaum 1998b: 54).
One has to consider that polarity need not be the same in the two languages (Palacios Martínez 1999: 66), but indeed several Galician negative expressions have an English negative correlation that is not in WordNet.These expressions sometimes match the literal sense (for example, non dar crédito aos seus ollos: can't believe one's eyes; non dar cuartel: give / get no quarter) and sometimes they do not (for example, non ser santo da devoción de alguén "not be someone's preferred saint": not be someone's cup of tea).

Expressions Included in Galnet
As we highlighted in Section 3, 490 idioms, i.e., 61% of the phraseological units in the corpus, were introduced in Galnet.The introduction of a specific expression in a particular synset does not necessarily imply total equivalence at all linguistic levels of the Galician and English language expressions, as we decided to apply a loose interpretation of synonymy.We considered that this flexibility was preferable, instead of leaving a large set of idioms out, considering, besides, that this is the vision of synonymy already found in WordNet (Miller 1998: 23-24, see quotation in Section 1.2).
This flexibility can be easily verified by searching WordNet synsets, where we frequently find variants whose conceptual meanings or their uses do not fully match: for example, under the gloss 'censure severely' we find the verbs castigate, chasten, chastise, correct and objurgate, with semantic and register differences.When introducing verbal idioms from our corpus in Galnet, we used exactly the same criterion (see, for example, the expression facer as súas necesidades in Section 4.3.3).In this section, we will try to analyse three situations that fit the flexible interpretation of synonymy: expressions with different images, expressions with different combinations and expressions belonging to different registers or carrying different expressive values.

Different Images
In a few cases, the Galician idiom we wanted to introduce in Galnet had an equivalent with the same denotative meaning and with the same or a very similar image in the English synsets: botar raíces and take root, coller o touro polos cornos and take the bull by the horns, lavar as mans and wash one's hands or botar luz and shed light on.In some of these cases, for example, take the bull by the horns, the multi-verbal expression is the only element in the English synset.However, including synonyms in sets, combining multiverbal expressions and monoloxical units seems to be a more common practice: clear, clear up, crystalise, crystalize, crystallise, crystallize, elucidate, enlighten, illuminate, shed light on, sort out and straighten out ('make free from confusion or ambiguity; make clear').
As the previous example shows, in the same English synset we can find a set of variants including different images.This type of clustering is frequent in WordNet and it determined the way we worked with our experiment.For example, one should note that between the different variants for the idea of death included in the synset meaning 'pass from physical life and lose all bodily attributes and functions necessary to sustain life' we find totally different images in expressions such as buy the farm, cash in one's chips, give up the ghost, go or pass.In line with this, we introduced expressions carrying disparate images such as acabarse a corda "to finish the rope", botar a alma "to give up the soul", ir para a pataqueira "to go to the potato field" or pechar os ollos "to close the eyes" in the Galician synset.A similar example is found in the synset meaning 'flee; take to one's heels; cut and run'.Here we find English variants such as fly the coop, head for the hills, take to the woods or break away.In the Galician version, we incorporated figurative expressions with different images, such as for example, chamar ós pés compañeiros "calling your feet your companions", darlles sebo ás canelas "to apply grease to your shins" or poñer terra de por medio "to put land between [two things or people]".
We introduced Galician idioms in our Galnet corpus even when the coincidence with English forms in WordNet was only at a denotative level, either because there was no equivalence in the images of the Galician and English variants (botar toda a carne no asador "to throw all the meat on the grill" vs. do one's best; give full measure; give one's best, go all out), or because the English synset did not include figurative expressions (dar á luz "to give to the light" vs. print, publish).In many cases, the expression is only missing from WordNet, while there is a similar idiom in English.For example, for the synset whose gloss is 'gain or regain energy' we find the English variants gain vigor, percolate, perk, perk up and pick up.In this synset, we introduced the Galician idioms recargar pilas and cargar pilas, despite the fact that the English idioms charge batteries and recharge batteries are missing from WordNet.
Although this process we have shown seems to be the most common procedure for the English synsets, there are exceptions complicating the insertion of expressions in other languages in the corresponding WordNet versions.The first exception has to do with the existence of glosses explicitly incorporating images, a fact that obviously restricts the number of lexical units suitable under the corresponding synset.For example, linked to the gloss 'draw in as if with a rope; lure' (here and in the following definitions, the italics are ours), you can only find the English expression rope in.Other units that are denotatively close, such as draw in or lure, present in the definition, do not fit the synset because they do not include the image.For the same reason, we cannot incorporate in the corresponding Galician synset an idiom such as envolver nas redes "to wrap in the nets".Other cases where the definition includes the image are, for example, roost ('settle down or stay, as if on a roost') or perch / rest / roost ('sit, as on a branch').
Besides, although this is not explicit, the presence of a different image seems to justify at times the existence of different synsets for expressions that have identical or very close denotative values, contributing to a disaggregation of synsets that, in other more generic ways, has already been highlighted: Dado que la red léxico-semántica WordNet se ha desarrollado desde la óptica de los conceptos y las representaciones mentales a veces se da una excesiva granularidad en el tratamiento de los sentidos, lo que se traduce en un gran número de synsets que, desde una óptica lingüística, se podrían considerar excesivos (Martí Antonín 2001: 76).
[Due to the fact that WordNet's lexical-semantical network has been developed from the point of view of concepts and mental representations, sometimes there is too much granularity when addressing the meanings.This fact leads to a large number of synsets that could be seen as excessive from a linguistic perspective] For example, the expressions blow a fuse, blow one's stack, blow up, combust, flip one's lid, flip one's wig, fly off the handle, go ballistic, have a fit, have kittens, hit the ceiling, hit the roof, lose one's temper and throw a fit refer to clearly different images, but are under the same synset (whose gloss is 'get very angry and fly into a rage').Other idioms or verbs linked to anger are disaggregated in different synsets without being obvious for the user why sometimes expressions are grouped and why other times they are individualised.For example, each of these variants has its own synset: foam at the mouth, froth at the mouth ('be in a state of uncontrolled anger') steam ('get very angry') fume ('be mad, angry, or furious') sizzle ('seethe with deep anger or resentment') raise the roof ('get very angry') chafe ('feel extreme irritation or anger') bridle ('anger or take offense') Although the gloss for each synset is different in most cases (however, for steam and raise the roof are the same), it is difficult to find a coherent semantic criterion that, as we said before, explains either grouping or disaggregation.It is obvious that there are differences in the meaning of the above-mentioned expressions, partly due to the disparate images behind them: as Baránov and Dobrovol'skij (2009: 551) claim, "a forma interna das UUFF constitúe unha formación moi complexa que inflúe tanto no plano do contido dunha UF coma nas particularidades do seu uso" [the internal form of the idioms is a very complex structure that influences both the content of an idiom as well as its use].Still, this fact would indeed lead to separate, for example, blow a fuse and lose one's temper, idioms that are, however, grouped under the same synset.
Thus, introducing Galician expressions was made more difficult due to the lack of a single criterion when establishing synsets, linked to the fact that there is not always a direct connection between all variants belonging to the same semantic field that may help design a complete overview of the synsets and glosses linked to them.For example, raise the roof, bridle and steam are listed as hyponyms of anger / see red 'become angry'; these two verbs, as well as fume and chafe, as hyponyms of feel / experience 'undergo an emotional sensation or be in a particular state of mind' and foam / froth at the mouth as hyponyms of rage 'feel intense anger' for which there is no hyperonym.In any case, when introducing the Galician idioms in Galnet we decided, whenever possible, to respect the structure present in WordNet, so that the expression botar fume "to release smoke", would be linked to the ILI for the English verb fume or botar escuma pola boca "to release foam from the mouth" with the idioms foam at the mouth and froth at the mouth.

Different Constructional Possibilities
Some Galician idioms are introduced in synsets with concepts whose applicability is more generic than the said idioms.For example, gardar as costas "to guard [somebody]'s back" is a expression only combinable with nouns or phrases with human reference, but it was introduced in the synset 'protect against a challenge or attack', where we find the English verbs defend, guard, hold, combinable with any kind of referents.Another example is entrar en calor "enter in warmth", a idiom only applicable to living beings, introduced in the more generic synset 'get warm or warmer', materialised as warm and warm up, combinable with both animate and inanimate referents.
In other cases, differences between concepts have to do with the (in)transitivity of the Galician and English expressions.For example, the verbal idiom abrandar as pedras "to soften the stones" ('to arouse great compassion') is linked to the English verb move, glossed as 'arouse sympathy or compassion in'.Against the English verb and other Galician variants in the same synset (conmover, emocionar or encoller o corazón), the idiom abrandar as pedras does not allow for a combination with any element that designates the moved person as the functional gap is already filled by as pedras.
A similar case can be found in levantar a lebre "to flush out the hare", included in the synset whose meaning is 'make known to the public information that was previously known only to a few people or that was meant to be kept a secret', realised in English through the transitive verbs break, bring out, disclose, discover, divulge, expose, give away, let on, let out, reveal and unwrap, all of them combinable with a free complement (with examples from WordNet where those complements are highlighted "he broke the news to her", "the actress won't reveal how old she is").We introduce the Galician expression in the abovementioned synset even when this idiom already includes the object in its meaning ('to uncover something that should have remained hidden') and does not therefore incorporate another extra element, as you can see in the example (2) from the Corpus de Referencia do Galego Actual (CORGA) (italics ours) (2) Pero Herbie Stempel (John Turturro), un participante decepcionado, levantou a lebre (O Correo Galego 06/02/1995).
Other similar examples are dar o brazo a torcer "to give your arm to be twisted", 'stop maintaining or insisting on; of ideas or claims'; volver a vista atrás "to turn your gaze back", 'look back upon (a period of time, sequence of events)' or saír de dúbidas "to come out of doubt", 'find the solution to (a problem or question) or understand the meaning of' (WordNet glosses).

Different Registers and Expressive Values
The different variants in the WordNet synsets may belong to very different language varieties and the relations established are independent from diaphasic, diastratic or diatopic variation.This is applicable both to the synset variants, as well as to the equivalents in different languages connected through the same ILI: "Similar tests have been developed for every relation in E[uro]W[ord]N [et], in each of the different languages.Note that these tests are devised to detect semantic relations only and are not intended to cover differences in register, style or dialect between words" (Vossen 2002a: 13).
The same thing is applicable to the different expressive values of variants: "In EuroWordNet, both the pejorative and the neutral term are members of the same synset and may have a single ILI-record as equivalent" (Vossen 2002: 44).
Thus, just as in English the conceptual synonyms perish, die or kick the bucket are grouped in the same synset, so in the Galician version we introduced many colloquial expressions, sometimes pejorative (estirar a pata "to stretch the leg", ir para a pataqueira "to go to the potato field", recachar o rabo "to lift the tail", etc.), together with formal verbs (perecer "perish") and standard verbs (morrer "die"), without any expressive load.
We introduced variants with different diasystematic marks, regardless of whether the English expressions belonged to a single variety.For example, in the synset whose gloss is 'give help or assistance; Brought to you by | Universidade de Santiago de Compostela Authenticated Download Date | 10/19/17 9:41 AM be of service' and whose English variants are the standard forms aid, assist and help, we introduced both Galician unmarked variants (axudar "help") as well as colloquial forms (botar un cabo "to launch a rope").
However, there are some exceptions to this procedure in WordNet.Indeed, some glosses specifically highlight a diaphasic restriction for the items in the synset (our italics): 'informal terms for a difficult situation': fix, hole, jam, kettle of fish, mess, muddle, pickle; 'informal terms for insanity': craziness, daftness, flakiness.We consider that introducing use-related information in WordNet's lexical units can prove highly interesting (in fact, we are including diaphasic tags in Galician expressions, see Gómez Clemente et al. 2013), but mixing diverse criteria in processing conceptual synonyms does not seem desirable.In any case, we decided to respect the original structure of the English version as much as possible, and group the Galician equivalents according to the information on their use gathered in the glosses and the distribution of English synonyms.

Introducing Idioms in Galnet: Our System
In this section, we shall address three aspects related to the method we used to introduce idioms in Galnet: first, we shall reflect upon the way we processed idioms that are semantically decomposable.We shall then analyse the criteria to discard the introduction of combinatorial information on the idioms, and finally we shall explain the circumstances that led us to disaggregate some of the expressions in our corpus into several synsets.Vincze et al. (2012) highlight that in WordNet there are some synsets containing components of idioms.They exemplify this with the synset for gutter, sewer and toilet ('misfortune resulting in lost effort or money'), nouns that, according to the authors, are only combinable in this sense with the verbs be and go.Another example quoted in the same study is seventh heaven or cloud nine, included in a synset with the gloss 'a state of extreme happiness' along with forms such as bliss or blissfulness.Nonetheless, the former seem to be fixed in particular constructions: in seventh heaven and on cloud nine, respectively, usually preceded by be.In fact, the expression be on cloud nine 'feel extreme happiness or elation' is also included in WordNet.

Processing Idioms that Are Semantically Decomposable
Direct introduction of constituents of fixed expressions as components of a synset, when they are semantically decomposable, is advocated for by Osherson and Fellbaum: Treating idioms as "long words" in this manner is convenient in the case where the idiom is not composed of constituents that have a meaning, i.e., metaphors.But in many cases, the components of idioms can be said to be lexical items (formmeaning pairs) in themselves.For example, in spill the beans, the verb arguably carries the meaning "reveal" and beans refers to "secret or confidential information."Speakers assign such meanings to the idiom components, as can be seen by the fact that they modify them or substitute semantically similar items [...].We need to reflect, first, the metaphorical status of such words and, second, the fact that their use is limited to the particular context of an idiom (Osherson, Fellbaum 2010: 3).Thus, taking the example chosen by the authors, their claim is that the components spill and beans, although linked, should be in different synsets (the first word in the synset of the verbs meaning 'reveal' and the second in the synset of the nouns meaning 'secret or confidential information').
However, these authors also highlight some of the problems linked to this kind of treatment, such as that of idioms that have a fixed preposition as a component, due to the fact that prepositions do not have any entries in WordNet.Vincze et al. (2012) also emphasise the problems of treating decomposable idioms in this way from a multilingual perspective.In consequence, and taking into account that the examples of semantically decomposable idioms disaggregated in several synsets are few in WordNet, we decided to have a single entry for these idioms.Therefore, expressions such as levar a batuta "taking the baton" (where levar means 'exercise' and batuta 'command'), dar volta á tortilla "turn the tortilla" (where dar volta is 'change' and tortilla 'situation') or xogar con lume "play with fire" (where xogar is 'involve' and lume 'dangerous activity') are a set in Galnet and they were not disaggregated into several components distributed in several synsets (see Álvarez de la Granja 2003 for semantically decomposable Galician idioms).

Combinatorial Information
As expected, the idioms in our corpus show different ways of adapting to discourse.Some of them, such as botar abaixo "to bring down" or coller nun renuncio "to catch [somebody] revoking (in a card game)" combine with a direct object and a subject.Others such as ler a cartilla "to read the rules" or entrar polos ollos "to enter through the eyes" require an indirect object and a subject, while others such as abrandar as pedras "to soften the stones" or entrar en calor "to enter in warmth" only require a subject, etc.We are aware of the fact that knowing the syntactic behaviour of these expressions and their selection restrictions is important to use them correctly.In fact, the lemmas of many dictionaries of idioms gather this information (see, for example, these two lemmas of the Dicionario de Fraseoloxía Galega of López Taboada, Soto Arias (2008), s.v.claro and fociño respectively: sacar en claro [u.p. algo] or crebarlle os fociños [u.p. a alguén], where u.p is the abbreviation for "somebody" and is always a subject; algo is "something" and a alguén is "[to] somebody".Nevertheless, we decided not to include the combinatorial information in the presentation of the variants due to three reasons.
The first reason is practical: introducing this information would complicate the inclusion of variants in Galnet as it is not always easy to determine the selection restrictions of the expressions.Secondly, the inclusion of a generic element (such as unha persoa "a person", algo "something", alguén "somebody", etc.) that is replaced in discourse by more concrete forms could complicate the automatic recognition tasks of idioms.Finally, the introduction of combinatorial information would be a discordant note in comparison to the Galician monolexical verbs, where this information is missing.
This last reason led us to discard the possibility of introducing prepositions in the case of idioms that need one.In general, the governed preposition is not included in the verbs in Galnet (transformarse and not transformarse en; acceder and not acceder a, optar and not optar por), so we thought that, to be consistent, we should not include it in the idioms either.For this reason, in Galnet there is the variant facerse cargo (and not facerse cargo de) or the variant beber os ventos (and not beber os ventos por).

Disaggregation
The initial corpus presented some identical idioms that were included in different entries whenever they had clearly differentiated meanings or grammatical behaviours.For example, abrir os ollos "to open the eyes" has three entries, abrir os ollos (1), (2) e (3), corresponding to the meanings 'to disillusion [somebody]', 'to become disillusioned' and 'to wake up'.Besides, as we explained in Section 2, when we tried to create a definitive corpus, we disaggregated the idioms facer as beiras and ver a luz into two entries with their differentiated pair of meanings.
However, in other situations, we had a single entry and decided to distribute it in several synsets in Galnet, as we mentioned in Section 3. We proceeded in this way whenever we found different synsets that were close in meaning and whose glosses and/or English variants seemed appropriate for the meaning of the Galician idiom in the initial corpus.For example, the case quoted in Section 3, the idiom dar pé "to give foot", was included in the synset whose gloss is 'cause to do; cause to act in a specified manner' and in another defined as 'cause to happen, occur or exist', as the Galician expression accepts both types of concretisation of cause, as we can see in the examples included in the Tesouro Informatizado da Lingua Galega (TILG): (3) non había que dar pé a que os molestasen (Lois Tobío, As decadas de T. L., 1994) Mundiños, 1996).
[there is a chapter that arouses the interest of many and gives rise to much speculation] Thus, from the perspective of some particular Galician idioms, a few synsets in WordNet include contextual nuances or sub-meanings, without implying the existence of different meanings for these idioms (at least from the point of view of Galician lexicographic practice).This fact led us to introduce them into several synsets in Galnet.Another rather evident example is the expression facer as súas necesidades "to do their needs", usually defined in Galician dictionaries through a single generic definition (see, for example, 'defecar e/ou ouriñar' "defecate and/or urinate" in the Dicionario de Fraseoloxía Galega of López Taboada, Soto Arias 2008: s.v.necesidade).Due to the fact that there is no synset in WordNet with this generic meaning, and considering that expressions with a different degree of generality tend to be grouped, we opted to include it both in the synset whose gloss is 'eliminate urine', as well as in the synset whose gloss is 'have a bowel movement', an option we deemed better than excluding the expression, quite frequent, from Galnet.The remaining examples of the idioms that were disaggregated can be seen in Appendix I.

Conclusions
Once we finished with our experiment, the Galician version of WordNet was enriched with a large number of idioms.However, many expressions were left out of Galnet due to two different reasons: lack of a lexicalised concept in English (17% of 805 idioms) and lack of an English idiom in WordNet (22%).We must highlight the fact that there are more cases where the lexicalised concept exists but is absent from WordNet, than those cases where there is no lexicalisation in English.In fact, according to our estimates, 83% of Galician expressions could find an English equivalent, always considering, of course, that the interpretation of equivalence, as we mentioned earlier, is very flexible and leaves out image, evaluative or register differences.These results are in line with the comments made by Corpas Pastor (2000: 483-484), who questions the supposed untranslatability of idioms and highlights the parallels existing in the idiomatic universes of languages.
The Princeton WordNet includes few idioms and this has hindered the introduction of a large share of idioms from our corpus in Galnet.Apart from that, introducing complex expressions in the English version has not always been done in a coherent and systematic way.This hinders the work of those responsible for the wordnets in other languages when they are looking for correspondences.
This does not mean, of course, that idioms cannot enrich the different versions of WordNet, as we have proven with the case of Galnet -linking a complex expression with an ILI does not require the presence of multi-verbal units in the English synsets.On the other hand, there is always the possibility for wordnets integrated in the EuroWordNet to create new synsets to include concepts that are not present in the English WordNet version (Vossen 2002a, Vossen 2002b), but this possibility goes beyond the objectives and interests of the Fralnet experiment and is not yet being explored for Galnet.
If wordnets are to optimise their utilities for the processing of natural language and for automatic translation, then idioms have to be included: "The need for M[ulti]W[ord]E[xpressions] lexicons grows even more acute for multi-lingual applications, for which (sometimes complex) correspondences must be identified, classified, and recorded" (Calzolari et al. 2002(Calzolari et al. : 1934)).This work shows that such an optimisation is possible, even being faithful to the current Princeton WordNet structure, and we hope that it will become a stepping stone for the expansion of the wordnets in other languages through idioms.
Acknowledgments: This research has been supported by the Xunta de Galicia and the European Union (under grant GRC2013/40) and by the Ministry of Economy and Competitiveness of the Spanish Government (Project TUNER: Automatic domain adaptation for semantic processing, TIN2015-65308-C5-1-R).
Brought to you by | Universidade de Santiago de Compostela Authenticated Download Date | 10/19/17 9:41 AM included the variant ser unlla e carne or synonyms comer no mesmo prato and levarse coma o pan e o leite.

List of Idioms from the Corpus Not Introduced in Fralnet
Brought to you by | Universidade de Santiago de Compostela Authenticated Download Date | 10/19/17 9:41 AM This table includes the 313 idioms from the working corpus that we could not include in any of the existing WordNet synsets.A few of them have a number in parenthesis.This happens whenever the expression has several entries in the work we used to create the corpus (Appendix of As locucións verbais galegas [Galician Verbal Idioms], Álvarez de la Granja 2003).The number helps identify the entry and the corresponding meaning in this book.to you by | Universidade de Santiago de Compostela Authenticated Download Date | 10/19/17 9:41 AM declarar a guerra Brought to you by | Universidade de Santiago de Compostela Authenticated Download Date | 10/19/17 9:41 AM