Terminology as a source of dif ﬁ culty in translating international legal discourses: an empirical cross-genre study

: Despite the persistent focus on terminology in legal translation studies, to date, no large-scale research has empirically explored the difficulty of terminology in translating legal genres. Approaches to translation difficulty in translation studies more broadly remain limited in scope. To fill this gap, a study was conducted to measure the difficulty associated with the translation of legal terminology and phraseology, as well as with terminology of other domains, in the LETRINT 1 + corpus, including nine representative genres of three institutional settings (the European Union, the United Nations and the World Trade Organi-zation). For comparative purposes, four levels of translation dif ﬁ culty were assigned to multiple terminological features by a group of specialized translators through a consensus-building process of annotation based on the cognitive effort estimated for translation decision-making. The dif ﬁ culty scores obtained con ﬁ rm the correlation between legal singularity and higher translation dif ﬁ culty, as well as the connection of more commonly used legal terms and phrasemes, and core economic terms, with lower dif ﬁ culty levels. The ﬁ ndings also provide evidence of the prominence of non-legal specialized terminology in institutional legal discourses, and the aggregate terminological dif ﬁ culty levels of each genre examined, which can be particularly useful for informing translation quality assurance, project management and translator training.


Introduction
As highlighted by Cao (2007: 53), terminology is "the most visible and striking linguistic feature of legal language as a technical language" and "one of the primary sources of difficulty in translating legal documents". While terminology has been a major focus of research in Legal Translation Studies (LTS), the field lacks empirical studies of the nature and translation-oriented difficulty of terminology in legal genres, including both legal and other specialized terms. Case studies tend to centre on selected legal terminology in specific branches of law, legal semantic fields or legal genres. However, there is consensus on the thematic diversity of law and its interactions with other fields of knowledge, which suggests that specialized language from other domains may be as prominent as legal discourse features in legal texts (Prieto Ramos 2019: 34). To what extent is this the case and what level of difficulty does it present for translators?
If applied to a genre or a field of translation practice, this question requires a granular description of terminological features and their associated translationoriented difficulty in a representative corpus of texts. Previous approaches to translation difficulty have included lexical items or terminology among sources of text difficulty for translation (e.g. Nord 2005: 168;PACTE 2011: 327). To date, however, only a few approaches, usually focused on a limited textual scope, have been empirically tested (e.g. Campbell 1999;Hale and Campbell 2002;Sun and Shreve 2014), and no research has explored the difficulty of the terminology of multiple knowledge fields in professional translation decision-making from a cross-genre comparative perspective. This is one of the aims of the LETRINT project on legal and institutional translation. 1 The project examines the scope, discourse features and translation patterns in this area with a view to establishing connections between process, competence and product adequacy according to a holistic approach to translation quality (Prieto Ramos 2015). In this context, terminological and phraseological features are considered key components of institutional discourses and translation decision-making. Accordingly, they must play a central role in studies of legal translation difficulty.
Lexical features have also been integrated in approaches based on text readability formulas, together with sentence length or structural complexity (see e.g. Jensen 2009;Mishra et al. 2013). As noted by Sun and Shreve (2014: 99-100, 116-117), while "vocabulary difficulty" stands out as a prominent difficulty factor, readability formulas are unreliable predictors of translation difficulty as they focus on comprehension indexes, but tend to omit transfer difficulty factors. As will be explained in the next section, our approach concentrates on the cognitive effort applied in decision-making, as measured by a group of specialized translators for the institutional scenarios selected in the LETRINT project.
The project explores the correlation between legal singularity and translation difficulty. In legal translation more generally, there is a persistent interest in the translation challenges arising from legal singularity and related instances of incongruence between source and target legal systems. These issues are often associated with higher degrees of translation difficulty and thus potential "rich points" in translation, 2 which in turn are of particular interest for analyzing translation adequacy levels and their implications for translation competence and institutional quality assurance (see forthcoming LETRINT outputs). However, the correlation between legal singularity and translation difficulty has yet to be empirically tested on a large scale. It is also hypothesized that the terminological hybridity of international legal texts and the translation difficulty associated with their terminological features vary according to institutional setting and genre.
To shed light on this variability in a representative selection of organizations and genres, the project focuses on the text production of several supranational and intergovernmental settings: the four main EU institutions (the Commission, the Parliament, the Council and the Court of Justice [CJEU]), the United Nations (UN) and its International Court of Justice (ICJ), as well as the World Trade Organization (WTO). Section 2 offers further details on the corpus and the approach used in the study, whereas Section 3 summarizes the findings on translation difficulty per discourse feature, institutional setting and legal genre, along with their corresponding aggregate difficulty levels considering the density of the discourse categories annotated in each genre.

Corpus and approach
The LETRINT 1+ corpus has been used for the analysis of discourse features and associated translation difficulty levels. This corpus is the result of a multilayered sequential approach to corpus building adopted by the LETRINT project on the 2 On the use of "rich point" in Translation Studies more broadly, see e.g. Nord (1997: 25) andPACTE (2009: 212-216). The concept was borrowed from anthropology. Michael Agar defined it as "thingsfrom lexical items through speech acts up to fundamental notions of how the world works -" that "strike you with their difficulty, their complexity, their inability to fit into the resources you use to make sense out of the world" (1991: 168). basis of a full mapping and categorization of institutional text production from a legal perspective (see Prieto Ramos 2019;Prieto Ramos et al. 2019). Quantitative and qualitative criteria were applied through stratified sampling techniques with a view to ensuring a representative and comparable collection of genres of international law translated in the abovementioned EU and multilateral settings.
The corpus covers three major legal functional categories found in all these settings: law-and policy-making (category 1), compliance monitoring (category 2) and adjudication (category 3). The textual units compiled include samples from three years (2005, 2010 and 2015) belonging to the most prominent genres of each setting and legal function. The LETRINT 1+ corpus is thus composed of nine sub-corpora, as listed in Table 1, and a total of 752,061 tokens and 256 textual units (TUs). It is a parallel trilingual corpus including the English, French and Spanish texts of each multilingual document, except for the documents compiled from the ICJ (i.e. UN-3), which are only available in English and French. For the purposes of analysis of cross-cutting patterns according to primary legal functions, the three sub-corpora of each function, i.e. one per institutional setting, are further grouped together to form a functional set of sub-corpora, namely: L1+(1) for law-and policy-making, L1+(2) for compliance monitoring and L1+(3) for adjudication.
To ensure both representativeness and feasibility, a threshold of approximately 90,000 tokens and a minimum of 12 TUs were established for each sample, except for the WTO law-making sub-corpus (WTO-1), which includes all the 32,121 tokens of textual production for this category and did not require any downsizing. In all cases, the original versions (i.e. English texts, except for the CJEU, where the original language of judgments is French) were used both for the word counts and for the subsequent annotation of discourse features and translation difficulty levels.
The corpus was annotated manually with UAM Corpus Tool, version 3.3 (O'Donnell 2019), by reading each text for the purposes of translation in each institutional setting. In order to compare perceptions of difficulty from the perspective of two different target languages, two annotation teams were created, each of them consisting of a French language translator and a Spanish language translator. 3 This was essential to enrich the analysis of the original discourse features and identify divergences in annotation that may be related to the legal traditions in each language.
All the annotators complied with a series of profile criteria tailored to the project needs: they were qualified translators with a specialization in legal translation (encompassing law components as part of their translation degrees or additional training in law) and a minimum experience of 100,000 words of professional translation, including legal and institutional translation assignments. These requirements ensured familiarity with legal translation issues and institutional translation standards. To avert bias with regard to specific genres or settings, deliberate efforts were also made to recruit translators with complementary profiles and exposure to a diversity of translation scenarios rather than with experience in a single organization.
Although preliminary annotation tests revealed a high level of convergence in applying LETRINT's annotation taxonomies, a thorough validation process was established to reach consensus between the annotators. All annotations were double-checked within and between teams in order to identify and resolve inconsistencies or discrepancies. The annotation results were then examined by the project leader for further revision, group discussion and final adjustments across genres where appropriate. This process also informed gradual refinements of certain annotation categories based on the analysis of borderline instances of categorization during the first stages of corpus annotation.
Two sets of categories were systematically used for annotation: discourse features and translation difficulty levels. The first taxonomy includes the following categories: Legal terminology (LEG-T) -LEG-GEN: Terms used to refer to concepts that are perceived as generic and common to multiple national and international legal systems (e.g. "legislation", "appeal"). -LEG-INT: Terms coined in the international legal order and recognized as established terminology within the scope of competence of a particular organization (e.g. "acquis communautaire", "tariff escalation"). -LEG-NAT-SIN: Terms designating singular concepts that are specific to national legal systems or traditions (e.g. "magistrates' court", "Chancery Division"). -LEG-NAT-GEN: Names of national, regional or local bodies, instruments or positions that are common to multiple legal systems and traditions (e.g. "Parliament", "Constitution").

Terminology of other specialized domains (SPEC-T)
-ECO-T: Terms related to economics, employment, trade and business (e.g. "subsidy", "labour market"). -FIN-T: Terms used in finance, including fund management, banking and budgets (e.g. "accounting record", "stock"). -POL-T: Terms that refer to politics, government structures, social and other public policies, and administrative matters (e.g. "competent authority", "civil society organization"). -SCI-T: Terms used in the natural sciences, including physics, chemistry and biology (e.g. "butyric acid", "genetically modified organism"). -TEC-T: Terms that designate technical applications of science, including machines, processes and materials used in industry, transport and communications (e.g. "chemical tanker", "carbon sequestration").
Institutional titles (ITT) -ITT1: Titles of international legal instruments, official documents or cases (e.g. "Convention on the Rights of Persons with Disabilities", "Treaty on the Functioning of the European Union"). -ITT2: Established names of institutional bodies, positions and institutional events, programmes or processes (e.g. "European Food Safety Authority", "Doha Round").
Legal phraseology (LEG-P) -LEG-P1: Phrasemes, including prepositional phrases and other lexical collocations, that characterize legal discourses (e.g. "pursuant to", "without prejudice to"). -LEG-P2: Established expressions or formulas that can be identified as genre conventions and contribute to primary text functions, often of a performative nature (e.g. "is amended as follows", "shall enter into force on").
A distinction was established between LEG-NAT-SIN and LEG-NAT-GEN terms for the purposes of discriminating levels of legal singularity and associated translation difficulty within national legal terminology. When a concept was used in a particular field of specialization but also had a legal meaning as part of a branch of law, the nature and origin of the term was analyzed in order to determine the most appropriate category. Research was also necessary in instances where the international or national origin of the term was not clear. Institutional titles were isolated and divided into two categories: legal instruments and other documents (ITT1) and institutional bodies, programmes and positions (ITT2). This was decided in light of their nature as multi-word units that are normally established in each institution and may contain domain-specific references (for example, the title of a legal act referring to a technical subject). While they could be associated to LEG-INT, they deserved specific attention for translation decision-making analyses. Finally, based on Gries's (2008) definition of phraseologisms 4 and previous studies of legal phraseology (e.g. Kjaer 2007), our corpus-driven tests confirmed the use of two types of phraseology in the legal genres analyzed: simple phrasemes commonly found in legal discourses (LEG-P1) and more complex formulaic expressions used in specific genres (LEG-P2).
As for the translation difficulty levels to be assigned to the abovementioned terminological and phraseological features, a scale was developed for the purposes of the LETRINT project using cognitive effort as a yardstick. In line with Hale and Campbell's definition of text difficulty for translation "as a function of the cognitive effort required to process the item in question and convert it into the target language" (2002: 15), the LETRINT approach considers the effort needed for the comprehension and reformulation of each annotated unit, including, crucially, the amount of research required for translation decision-making. As a result, four difficulty levels were established: -No difficulty (DIF-0): the unit is well-established and easy to understand and reformulate; it does not require any research. -Low difficulty (DIF-1): the unit is easy to understand and reformulate; it only requires simple verification and limited time for decision-making (e.g. standardized terminology included in reliable institutional sources). -Medium difficulty (DIF-2): the source meaning can be grasped but several searches may be necessary to understand all nuances and/or reformulate the unit, for example, due to the level of technicality or the diversity of potential reformulations (e.g. more than one rendering suggested in institutional resources or previous translations). -High difficulty (DIF-3): the unit is of significant complexity for comprehension or reformulation, for example, due to its singularity or lack of translation precedents; it demands more extensive research beyond institutional resources (e.g. singular concept unique to a specific culture and/or borrowed in a third language).
The uniform interpretation and application of this framework within the annotation team was critical to avert the risk of bias or reliance on personal intuition. This common understanding was reinforced by the suitability and comparability of annotators' profiles, on the one hand, and by the consensus-building process and multiple verifications conducted to validate annotations, on the other, as explained above. In turn, this proved essential to keep individual translator competence factors stable, and anchor difficulty assessment in semantic analysis and translation process factors (for a review of relevant factors, see Sun 2015). As opposed to experimental studies that measure inter-rater variations between different groups of participants with regard to a small number of text passages, this workstream of the LETRINT project was purposefully designed to provide more generalizable results by characterizing terminological difficulty in a number of legal genres through the annotation of a large amount of text by a coordinated team over more than a year. The only inter-annotator discrepancies allowed and registered for the calculation of average difficulty levels were those attributable to the different target languages and cultures within the annotation team, e.g. differences confirmed as language/culture-bound discrepancies after explicit discussion between the French language annotator and the Spanish language annotator). In such cases, the average difficulty level for the unit was registered (e.g. DIF-1.5 when the confirmed input values were 1 and 2). This was paramount to acknowledge differences in perceiving specific concepts or asymmetries in the comparative analysis required for the reformulation of singular legal terms in particular.

Results
This section presents the results of the annotation work that was conducted to determine the difficulty of translating terminological and phraseological features. An overview of the density and difficulty of these features is first provided (Sections 3.1 and 3.2), before focusing on their difficulty levels per institutional setting and primary legal function (Section 3.3). Finally, the aggregate difficulty resulting from the combination of density and difficulty findings are outlined (Section 3.4) prior to the closing discussion.

Density of discourse features
Two datasets were produced: one including the density of discourse features considering all annotations of each unit in each text, and another disregarding such repetitions, i.e. the unit is counted once when multiple occurrences of the same unit are annotated within the same text (see comparative overview in Figure 1). In order to obtain comparable data, all the results were normalized to 1,000 tokens. Given the focus on cognitive effort for the purpose of this research, the second set of data (see breakdown in Table 2) was essential to subsequently calculate aggregate difficulty values, as it is presumed that the translation-oriented analysis of a term or phraseme is generally conducted the first time it appears in a text rather than for each occurrence.
The opposite presumption could have led to a number of flawed indicators, especially for the discourse categories with very high or very low ratios of repetitions. The most remarkable case is that of LEG-GEN terms (and, to a lesser extent, LEG-INT units), which are understandably very frequent across the selected legal genres (39.75 annotations per 1,000 tokens, as opposed to only 3.58 disregarding repetitions). At the other extreme, in line with their more singular nature, LEG-NAT-SIN terms are the least repeated within a text, and thus proportionally   demand more fragmented decision-making (3.29 annotations, or 3.05 without repetitions, per 1,000 tokens).
As mentioned above, LEG-NAT-SIN unit repetitions are the most limited of all categories, which means that the density of these terms without counting repeated occurrences in a text (3.05 per 1,000 tokens) is very close to the density of total annotations (3.29). In other words, the effort required to translate a singular term from a national legal system tends to apply to fewer occurrences per 1,000 tokens. As the results corroborate, these terms are essential in describing implementation matters at the national level, most often in L1+(2) genres. POL-T is the next category, with 2.92 units and 12.46 annotations per 1,000 tokens. These values, the second most frequent within SPEC-T, are explained by the prominence of POL-T in the UN sub-corpora, in line with the organization's work on human rights and humanitarian issues.
Two other distinctive features of international legal discourses, LEG-INT and ITT2, both with averages of 1.68 units per 1,000 tokens, converge more significantly between institutional settings and legal functions, but are also more frequent in the UN than in the other two organizations. This applies to ITT2 in particular, which reflects the wider diversity of agencies, programmes and positions established in this organization through its longer history. In contrast, the UN sub-corpora registered the lowest values for all the other SPEC-T categories, including the remaining three, whose average frequencies in LETRINT 1+ rank next after ECO-T and POL-T: TEC-T (1.61 units per 1,000 tokens), FIN-T (1.39) and SCI-T (1.17). The highest scores of these categories are found in EU-1, EU-2 and WTO-2. TEC-T and SCI-T are particularly prominent in the first two EU sub-corpora.
The remaining categories are less frequent, with density values below one new unit per 1,000 tokens, disregarding repetitions: LEG-P2 (0.77), LEG-P1 (0.56) and LEG-NAT-GEN (0.46). LEG-P2 formulations stand out in UN-1 and, especially, UN-2, as opposed to the absence of such formulas in the stylistically-diverse ICJ Judges' opinions (UN-3). LEG-P1 phrasemes are systematically more frequent in law-making genres (L1+(1)), while LEG-NAT-GEN appeared more often in texts related to implementation matters, along with LEG-NAT-SIN, but in more marginal proportions.
The richest sub-corpora from a terminological perspective are, by order of density, WTO trade policy review reports (WTO-2), UN human rights treaty body reports (UN-2) and EU regulations (EU-1). The first two genres registered the highest frequencies within their institutions' top SPEC-T categories (ECO-T in the case of WTO-2 and POL-T in UN-2) and also of LEG-NAT-SIN terminology, a prominent feature of monitoring functions; while EU-1 has the most diversified distribution of SPEC-T terminology of all law-making genres, including the top scores of TEC-T and SCI-T. This reflects the thematic diversity of EU law and the frequently technical nature of its legal acts.

Overall difficulty levels per discourse feature
As for the average difficulty scores of the annotated features, the most frequent category within legal terminology, LEG-GEN, scored the lowest average difficulty per annotation of all the terminological features, 0.87 (see Figure 2). This can be explained by the extended use of, and translators' acquaintance with, many of these terms, which rarely require much research. More precisely, 19.29% of these were considered of no difficulty (e.g. "rights", "legislation"), i.e. the second highest DIF-0 proportion of all the categories after LEG-P1; 74.27% reached level 1 (e.g. "force majeure", "legal uncertainty"), and 6.31% were assigned level 2 (e.g. "law enforcement authorities", "compensatory remediation") (see Figure 3 and Table 3).
LEG-INT scored an average of 1.02, as the majority of units (96.47%) were placed at DIF-1 level; their translation usually involves verifying institutional resources (e.g. "MNF treatment" in the WTO sub-corpora or "sustainable development agenda" in UN texts). Only 2.75% reached level 2. The largest shares of DIF-2 terms are found within the two terminological categories associated with national legal systems, which registered the highest difficulty values of all the features, with a marked difference according to legal singularity: average of 1.24 for LEG-NAT-GEN and 1.68 for LEG-NAT-SIN. Almost two thirds (62.05%) of the terms of the latter category are considered of medium difficulty (DIF-2, e.g. "VAT and Duties Tribunal" in the context of the United Kingdom), as opposed to 29.34% of terms of low difficulty (DIF-1, e.g. Czechia's "Beneš Decrees"). This is the only terminological category in which no unit was considered of no difficulty, as well as the discourse feature that includes the most significant share of DIF-3 units (e.g. "Zhogorku Kenesh" in a document about Kyrgyzstan), albeit very small (0.94%). Both findings are very telling. As the rest of LEG-T results, they also help to confirm the correlation between legal singularity and translation difficulty.

Terminological difficulty in institutional legal translation
The reverse proportions of DIF-1 (62.18%) and DIF-2 (28.79%) apply to LEG-NAT-GEN, which reflects the comparatively easier comprehension of these terms as a rule, although they may also demand several queries for adequate contextualization and reformulation (i.e. DIF-2), for example, in the case of English denominations translated from third languages, such as "Judicial Courts of the First Instance" (with reference to Cape Verde) or "appeal in cassation" (in the Lithuanian legal system).
Unsurprisingly, LEG-NAT categories also registered the most significant number of discrepancies between difficulty perceptions according to target language, with 6.81% and 4.94% of DIF-1.5 units among LEG-NAT-SIN and LEG-NAT-GEN, respectively. All the other averages for differing French-language and Spanish-language translators' scores are below 1%. They all refer to values between DIF-0 and DIF-1 (average of 0.5) or between DIF-1 and DIF-2 (average of 1.5). There was no instance of discrepancy between high difficulty (DIF-3) and lower difficulty levels, which means that DIF-3 units were found to be particularly complex by all annotators, regardless of their language.
As for the terminology of other domains, average values fluctuate within a much more limited range, between 0.87 (ECO-T) and 1.07 (FIN-T). The score of the most recurrent category, ECO-T, is not only the lowest SPEC-T difficulty average, but also the second lowest of all the annotation categories (together with LEG-GEN and only higher than that of LEG-P1). This result is primarily determined by the significant proportion of ECO-T terms that are of standardized use and easily grasped by translators on a regular basis (e.g. "demand", "export"). In comparison with other specialized domains, this also suggests that legal translators are familiar with core economic concepts frequently referred to in international contexts. Indeed, 82.79% of the units are of low difficulty, while 14.25% qualified as DIF-0 and only 2.07% reached DIF-2. FIN-T registered a similar proportion of DIF-1 terms (85.08%, e.g. "asset", "financial performance"), but also the largest percentage of DIF-2 units within all SPEC-T categories (9.86%, e.g. "capital buffer", "default fund") and the lowest of DIF-0 (4.75%, e.g. "investment", "budget").
In the case of POL-T, the proportion of low difficulty terms remains predominant (83.27%), including multiple established terms, such as "sustainability" and "stakeholder". However, DIF-0 and DIF-2 scores converge, with as many level 0 units as level 2 units (both at 7.90%). This result reflects the need to verify the semantic nuance and translation precedents of certain terms referring to political, social or administrative matters in institutional settings (e.g. "children in street situations", "family reunification").
TEC-T and SCI-T are found in a middle position, with averages of 0.97 and 0.98, respectively. Their difficulty scores and distribution are very similar, including a majority of approximately 92% of DIF-1 terms (e.g. "wastewater treatment plant" within TEC-T and "bovine spongiform encephalopathy" within SCI-T), 5.92-4.90% of DIF-0 terms (e.g. "battery" within TEC-T and "biodiversity" within SCI-T) and 1.92-2.96% of DIF-2 terms (e.g. "lateral flow immunoassay" within TEC-T and "protease resistant PrPSc" within SCI-T).
As for institutional titles, the great majority of units only involve a single verification of the established denomination in the target language, especially for the translation of titles of legal instruments and other official documents (ITT1, with an average difficulty of 1). In the case of ITT2, 8.01% of units did not even require any verification due to familiarity with the names of key institutional bodies or positions. This explains ITT2's slightly lower difficulty average of 0.91. Finally, within legal phraseology, as expected, a difference was found between LEG-P1 (average of 0.79), more commonly used across legal genres and well-known to translators, and more genre-specific LEG-P2 formulations (average of 1), which systematically call for verification (e.g. "decides to remain actively seized of the matter" in UN resolutions). As opposed to the latter category, LEG-P1 includes a remarkable proportion of DIF-0 units (21.58%, the highest percentage for any discourse category), such as "by virtue of" or "pursuant to"; together with DIF-1 units (78.01%), such as "hereinafter referred to as" or "as last amended by", which require checking phraseological conventions within the relevant institutional genre.

Difficulty levels per setting and legal function
The breakdown of difficulty levels per institutional setting and legal function (Table 4 and Figure 4) shows the remarkable homogeneity of difficulty scores for all categories in all the sub-corpora. The only scores that diverge more considerably from the overall averages are due to coincidentally low or high difficulty values in sub-corpora that include a marginal number of annotations of the discourse category in question, and thus have no significant statistical impact on the relevant weighted averages per institutional setting or legal function. These exceptional cases are LEG-NAT-SIN's high score in UN-1 (2.67 for a density of 0.13 units per 1,000 tokens) and low score in WTO-1 (average difficulty of 1 for a 0.46 density), and a single LEG-P2 unit of DIF-2 in WTO-2 (density of 0.01).
Overall, the average scores for each institutional setting remain within a maximum deviation of 0.08 from the average for each discourse feature (see Figure 4), except for LEG-NAT-GEN (+/−0.11), POL-T (+/−0.13) and LEG-NAT-SIN (+/−0.18). The comparison of sub-corpora results according to primary legal functions also yields very uniform scores, within a differential range of −/+0.12 for all the annotation categories, except for LEG-NAT-SIN (see Table 4). This category registered the highest difficulty averages across settings and legal functions, while  LEG-P1 has the lowest. The only exception is L1+(1), where LEG-NAT-GEN has the highest score (but this is for the most marginal density of annotations for any category within the functional sub-corpora sets, 0.06 units per 1,000 tokens), and LEG-GEN's average (0.72, as a result of the more significant weight of DIF-0 units such as "article") is below that of LEG-P1 (0.81, reflecting the wider diversity of simple phrasemes in law-making texts). Interestingly, the most convergent values, all close to an average difficulty of 1, are those that generally involve checking established organization-specific terminology and phraseological conventions (LEG-INT, ITT1 and LEG-P2).
The average difficulty level per annotated unit is 0.94 in the EU sub-corpora, 0.98 in the UN's and 0.96 in the WTO's. The latter include the sub-corpora of the lowest and the highest average difficulty, 0.86 in WTO-1 and 1.02 in WTO-2. As shown in Figure 5, the variation of averages per legal function follows a common pattern across the board. In the three settings and in the entire corpus, texts about monitoring systematically registered the top difficulty level per annotated unit, followed by adjudication genres and, finally, the law-making sub-corpora.

Aggregate difficulty levels
In order to compare the broader implications of the above results for the translation of the genres under examination, the accumulated decision-making effort associated with all the annotated discourse features was calculated by combining the density and difficulty scores for these features in each sub-corpus. The two richest Terminological difficulty in institutional legal translation sub-corpora according to terminological density (see Section 3.1), both produced in monitoring procedures of multilateral organizations, also yielded the top aggregate difficulty results (see Table 5 and Figure 6). WTO-2 (46.65) stands out essentially because of the impact of LEG-NAT-SIN and ECO-T, which constitute the most salient relative weights in the entire dataset, 19.41 and 10.48, respectively. The second sub-corpus in terms of aggregate difficulty, UN-2 (32.92), includes the second highest value for LEG-NAT-SIN in the corpus (8.13) and the top POL-T score (6.96). WTO-3 (27.03) ranks third, closely followed by .
As opposed to the previous three sub-corpora, the terminology of non-legal domains demands more translation effort than legal terminology in EU regulations according to our data (13.82 vs. 5.49). The same applies to EU-2, UN-1 and, to a lesser extent, WTO-1. In contrast, the most significant proportions of aggregate difficulty for LEG-T are found in EU-3 and, especially, UN-3. As outlined above, their most common terminological category, LEG-GEN, is among the easiest to translate, which is reflected in the total difficulty scores of these sub-corpora and of L1+(3) as a whole. ICJ Judges' opinions, in particular, has the lowest aggregate difficulty score of all the sub-corpora, 16.52, also, crucially, because of the scarcity of SPEC-T terminology (aggregate difficulty of 2.32) in this genre.
At this point, it is worth stressing again that our scores are not meant to measure the total decision-making effort involved in the translation of the genres examined, but only the overall translation difficulty of terminological and phraseological features as key indicators. If we examined other structural criteria, we would have to consider, for example, that persuasive texts are generally considered more difficult to read than descriptive ones (e.g. Carrell and Connor 1991). In the case of ICJ Judges' opinions in our study, the less standardized stylistic features and structure of legal argumentation in these texts may constitute more significant sources of translation difficulty when compared with other genres in our corpus. These aspects, however, fall beyond the scope of this study.

Discussion and conclusions
Our findings provide the first empirical evidence to date on the difficulty of legal and other domain-specific terminology of international legal discourses in institutional translation from a cross-genre comparative perspective. Based on our analyses of density of terminological and phraseological features, it can be empirically affirmed that the legal discourses examined encompass, on the one hand, legal terminology (including concepts of international and national law), institutional titles and legal phraseology to construct the structure and conceptual framework of their legal genres, and, on the other, a broad range of terminology from other subject areas addressed in legal texts. The proportion of this other terminology is often more prominent than that of legal terminology, particularly in law-making and monitoring procedures, and accordingly requires more cognitive effort for translation, as revealed by assessing the translation difficulty of the annotated features. It is also within these non-legal terminological categories that the most marked variability is found between the institutional settings considered, in line with the policy areas and expertise of these organizations, i.e. more economic terminology in the WTO, more terms about political and social matters in the UN, and a more diverse distribution of themes in the EU, including the highest concentration of technical and scientific terminology of the entire LETRINT 1+ corpus analyzed in this study.
According to the difficulty scores obtained, the most difficult category for translation purposes is national legal terminology, especially terms designating singular entities and concepts. This finding corroborates the correlation between legal singularity and translation difficulty examined in the LETRINT project, and adds comparative validation to it. The recurrent reference to the multiple national contexts of implementation of international law in monitoring reports of multilateral organizations (in our study, WTO trade policy review reports and UN human rights treaty body reports) crucially contributes to these being at the top of aggregate difficulty levels of terminological decision-making. In the EU context, however, it is in regulations, the most technical of all the law-making genres analyzed, that terminology demands the greatest effort in translation.
These findings illustrate the implications of the thematic diversity of law for the professional practice of legal translation. They explain, for example, why nonlegal terminology may require more time than legal terms in translating a legal act, while national legal terminology may proportionally demand more attention in translating reports on legal implementation, an area that is often neglected in institutional translation studies but is part of the everyday reality of many institutional translators (Prieto Ramos 2014: 316-320).
The proven correlation between legal singularity and higher difficulty levels aligns with the idea of cultural distance as a factor of translation difficulty (Shreve et al. 2004, cited in Sun 2015, which is traditionally related to the distance between source and target legal systems in legal translation (e.g. De Groot 1987). In our study, the only significant discrepancies according to target language in difficulty assessments were found in connection with national legal terminology. Yet, they were limited to less than 7% of annotations of singular national terms, which in turn suggests a considerable concurrence of perceptions of translators into French and Spanish when dealing with legal asymmetries.
Our results also suggest additional connections between singularity, frequency, familiarity and difficulty, thereby supporting the hypothesis that "more frequent or familiar words present less difficulty to translators" (Campbell, 1999: 56). While singular terms are the least frequently repeated in our corpus and demand, on average, the most extensive research for decision-making, the reverse applies to more common legal terms and phrasemes that are easy to grasp and regularly reformulated by translators. Among non-legal domain terminology, this also applies to economic terms that are used in international contexts more frequently than other specialized terms.
This brings us to questions of expertise and specialization among translators in difficulty measurements. While there are no two identical profiles in translation, the comparable levels of legal translation competence among LETRINT's annotators have enabled us to maintain this as an invariable factor in the analysis of translation difficulty, and to reinforce our focus and consistency in comparing difficulty levels per discourse feature, institutional translation setting and primary legal function. The consensus-building approach and other methodological safeguards applied for corpus compilation and annotation also contribute to ensuring a high degree of representativeness and reliability.
It must be noted, however, that these results would be different, to some degree, should the annotators have had other translation specialties or experience in translating texts for a single organization. In fact, our findings highlight the relevance of legal translation competence for dealing with a massive amount of legal terminology in international legal discourses, including cases of timeconsuming legal incongruence. This expertise can explain the low proportion of high difficulty levels assigned to very singular terms, and may also be related to the familiarity of legal translators with economic subjects as a sister specialization. In other words, although more empirical testing would be needed to measure competence-linked variations, the added value of legal translation competence for the genres studied can clearly be inferred from our findings. This is more so if we consider the suitability of this specialization for addressing other sources of difficulty that have not been examined in this study, for example, dissecting legal reasoning or semantic ambiguities, identifying legal genre conventions or resorting to relevant legal sources for decision-making (Prieto Ramos 2020).
The implications of this research for translation quality assurance, project management and translator training are many, especially in connection with translator profiling for specific positions or assignments, estimating difficulty for translation jobs and improving terminology management. In the context of machine-aided translation workflows, this kind of research can also help identify areas that deserve special attention when monitoring terminological resource needs, automated terminology retrieval processes, machine translation performance and sources of potential difficulty for quality checks. Some of these aspects are further explored in the LETRINT project. several years as an assistant in the Translation Department and a doctoral candidate for the LETRINT project on institutional legal translation, within the Transius Centre. She has also worked as a translator and an interpreter since 2011, including a terminology internship at the United Nations Office at Geneva, a translation internship at the South Centre and, since 2019, various translation assignments for the United Nations Office at Vienna.