Self - mention and uncertain communication in the British Medical Journal ( 1840 – 2007 ) : The decrease of subjectivity uncertainty markers

: The communication of a scienti ﬁ c ﬁ nding as certain or uncertain largely determines whether that information will be translated into practice. In this study, a corpus of 80 articles published in the British Medical Journal for over 167 years ( 1840 – 2007 ) is analysed by focusing on three categories of uncertainty markers, which explicitly reveal a writer ’ s subjectivity: ( 1 ) I/we epistemic verbs; ( 2 ) I/we modal verbs; and ( 3 ) epistemic non - verbs conveying personal opinions. The quantitative analysis shows their progressive decrease over time, which can be due to several variables, including the evolution of medical knowledge and practice, changes in medical research and within the scienti ﬁ c community, and more stringent guide lines for the scienti ﬁ c writing ( regarding types of articles, their structure and rhetorical style ) . for writers to downplay their personal role to highlight the phenomena under study, the replicability of research activities, and the generality of the ﬁ ndings, subordinating their own voice to that of unmediated nature. Such a strategy subtly conveys an empiricist ideology that suggests research outcomes would be the same irrespective of the individual con ducting it.


Introduction
The communication of a scientific finding as certain or uncertain differently affects its translation into practice by the scientific community and the National Governments as well. Thus, distinguishing certain and uncertain information is a crucial need both in the scientific field in the strict sense and in the popular scientific domain.
Given the importance of this topic in determining practical decision-making, the study of hedging, uncertainty, mitigation, and the like in scientific writing has received increasing attention from scholars since the 1990s (e.g. Crompton 1997, 1998, Dudley-Evans 1994, Hyland 1994, 1998a, 1998b, 2004, 2014, Hyland and Milton 1997, Rozumko 2017, Salager-Meyer 1994, Skelton 1997, Vold 2006. Additionally, recently, researchers in the NLP community have focused their attention on the detection of certainty and uncertainty markers (UMs) and their linguistic scope (e.g. Agarwal and Yu 2010, Bongelli et al. 2012, Farkas et al. 2010, Kim et al. 2009, Omero et al. 2020, Özgür and Radev 2009, Szarvas et al. 2012, Vincze et al. 2008, Zhou et al. 2011, 2015, Zou et al. 2013. In a similar way, also the expressions of subjectivity and self-mention in academic writing have been extensively studied by adopting prevalently cross-cultural and cross-disciplinary approaches (e.g . Fløttum 2005, 2006, Gao 2017, Hyland and Jiang 2016, Khedri 2016, Mur-Dueñas and Šinkūnienė 2016, Rongen Breivega et al. 2002, Salager-Meyer 1999a. As for diachronic studies concerning the expressions of the authors' subjectivity in biomedical articles, we can mention Atkinson (1992Atkinson ( , 1996 and Salager-Meyer (1999b). As for the first one, a corpus of articles of different genres from the Edinburgh Medical Journal  was rhetorically and linguistically analysed. Particularly, rhetorical analysis investigated how authors represent or "place" themselves in their texts, text forms, and discourse structures used to report research, and the nature of the "discourse communities" in which texts were situated; linguistic analysis, conducted by using Biber's (1988) multidimensional approach to register analysis, considered five polar dimensions: involved vs informational productions, narrative vs non-narrative concerns, situation dependent vs explicit reference, overt persuasive vs non-persuasive expressions, and abstract vs non-abstract information. The results of Atkinson's analysis showed an evolution of medical research writing in relation to significant changes, regarding the advance of both medical knowledge and practice, the growth of a professional medical community, etc. As a general outcome, this evolution implies a shift from a narrative and author-centred perspective to a non-narrative and information-centred one. Similar results are achieved by Salager-Meyer (1999b), even if her analysis (conducted on a corpus of medical articles published in 34 different British and American journals, between 1810 and 1995) specifically concerns the academic conflict in the English medical discourse, considering both overt attacks and covert criticism. Her results outlined an evolution from a more direct, personal, polemical, and aggressive linguistic attitude to a more indirect, mitigated, and "fact/finding" centred one. This evolution shows an increase in the degree of deference towards the authors being criticised and a decrease in the writer's commitment towards the uttered criticism, and "reflects the shift from an author-centred and privately-based medicine to a factinvoking, professionalized and highly competitive scientific community" (Salager-Meyer 1999b, 371).
Nonetheless, to the best of our knowledge, specific studies that deal with hedging functions of markers of subjectivity are less numerous (Grabar et al. 2016, Hyland 2001, Hyland and Jiang 2018, Shehzad 2007, Walková 2019 and no work has been specifically conducted on subjectivity UMs in medical corpora, performed by adopting a diachronic perspective. In this study, we have qualitatively and quantitatively analysed 80 articles randomly selected from the British Medical Journal (BMJ), which cover a time span of 167 years , with the main aims to identify (a) which and how many UMs of subjectivity are present, testing whether there are significant variations in their use over time; (b) the linguistic scope (i.e. the linguistic influence) of each subjectivity UM, calculating the total amount of uncertainty communicated by them.
Consistently with our hypotheses, and in line with the aforementioned studies asserting an evolution away from an author-centred perspective, the results of the analyses show a progressive decrease in the number of these markers with time and in the percentage of uncertainty communicated by them.
These resultseven if without delving into questions concerning the wider field of the epistemology of scienceare discussed taking into account the parallel evolution of the medical research (and the sciences in general), on the one hand, and the scientific writing, on the other, considering the linguistic and rhetorical means used by the authors for communicating their findings (Atkinson 1992, 1996, Bazerman 1988.
This descriptive study aims to contribute to the research on diachronic variation in scientific writing, particularly regarding hedging and subjectivity strategies.

Our previous studies on UMs. Theoretical and methodological approach
The study presented in this article originates from the main results of a Research Project of National Interest, which was funded by the Italian Ministry of Education University and Research, aiming at analysing the communication of certainty and uncertainty in the three corpora of medical articles (both scientific and popular) from a diachronic perspective. The three corpora were, respectively, composed of: 1. 80 articles randomly selected from BMJ 1840-2007 (available at PubMed Central, http://www.ncbi.nlm. nih.gov/pmc/journals/3/, last access February 2012); 2. 12 articles randomly selected from BMJ 2013 (http://www.bmj.com/archive of the BMJ, section "Research"); 3. 12 popular articles randomly selected from Discover Magazine 2013 (http://discovermagazine.com, section Health and Medicine http://discovermagazine.com/topics/health-medicine).
In our previous studies (Bongelli et al. 2012, 2019, Omero et al. 2020, Zuczkowski et al. 2016), we adopted a mixed procedure of analysis, which combines a top-down and a bottom-up approach for identifying UMs, and an epistemic stance perspective (Zuczkowski et al. 2017) on certainty and uncertainty, which exclusively takes into consideration the point of view of the writer/s (=the author/s of the article) in the here and now of their communication. This procedure differs from that of authors interested in studying hedging in scientific writing, as for example Holmes (1988), Hyland (1995), and Hyland and Milton (1997), who mainly adopted a top-down approach, by extracting the UMs to be applied in their analyses from grammar books and dictionaries.
All the articles were first manually edited and then evaluated in two phases by a group of analysts for detecting all the UMs (Zuczkowski et al. 2016). Second, they were converted into plain text files (.txt) for refining the manual analysis through WordSmith Tools (Scott 2012).
Such analysis led us to identify seven categories of UMs, namely: epistemic verbs in the simple present (e.g. I/we think, believe, suppose, imagine; to seem, etc.); modal verbs in the simple present, in their epistemic (non-deontic) use (e.g. I/we can, may, must, etc.); modal verbs in the conditional mood, in their epistemic use (e.g. I/we could, might, etc.); epistemic non-verbs (adjectives, adverbs, nouns, expressions as, for example, possible, perhaps, doubt, in my opinion, etc., which, sometimes, can even present themselves together with a non-epistemic verb, for example, as in the case of we are in doubt); if clauses (by excluding the zero conditional, as, in this case, if can be paraphrased by a temporal conjunction that communicates certainty); uncertain questions (i.e. questions that communicate not something unknown by the author but something uncertain, which is to say that it implies a hypothesis to be confirmed, as for polar, alternative, and tag questions; see Zuczkowski et al. 2016, 2021, 2019; epistemic future.
The identification of these categories of markers was partially consistent with those proposed by other authors, who recognised the hedging function of epistemic verbs and modal verbs (Crompton 1997, Hyland 1994, Myers 1989, Salager-Meyers 1994, Skelton 1988, probability adverbs (Crompton 1997, Hyland 1994, Myers 1989, Salager-Meyers 1994, probability adjectives (Crompton 1997, Hyland 1994, Salager-Meyers 1994, and if clauses (Crompton 1997, Hyland 1994. Our classification is distinguished by the choice of considering only the writer/s' perspective and includes, in addition to the previous ones, uncertain questions and epistemic future. After identifying and calculating the quantity of UMs, their linguistic scope was quantified. The scope of an UM is the stretch of language affected by it, i.e. "the part of the sentence that is modified by the cue" (Szarvas et al. 2012, 353), the constituents that fall "within the uncertain interpretation" (Farkas et al. 2010, 3). For example, in the following extract from the corpus of 80 articles (BMJ 1840(BMJ -2007 the linguistic scope of the marker It is not quite clear to me includes 15 tokens (in bold): It is not quite clear to me how the infection is conveyed in such cases. We have always assumed that the streptococci on the fauces will readily find their way into the saliva and be transferred by droplet spray on coughing, etc. (Colebrook 1933, 724) Furthermore, all the .txt files have been imported in Knowtator annotation text tool¹ (integrated with Protégé knowledge representation system)² and the manual annotations of the UMs (including their linguistic scope) were replicated in order to construct and train an algorithm for the automatic detection of such markers (Omero et al. 2020).
These procedures allowed us to know the total amount of uncertainty (i.e. the UMs + their linguistic scope) present in each article and in the whole corpus and, for difference, also that of certainty. The amount of certainty was indeed calculated on the difference between the total number of words of each article and those of uncertainty.
According to our perspective, certainty and uncertainty are two epistemic contraries encoded within linguistic communication (see Zuczkowski et al. 2017, cap. 4, for an experimental demonstration). The theoretical basis of our model (Zuczkowski et al. 2017(Zuczkowski et al. , 2021 concerns epistemic stance as a linguistic, communicative notion, not a mental one, conveyed by the linguistic expressions used by speakers/writers in a given context. The model identifies three epistemic positions: the Known/Certain, the Uncertain, and the Unknown (lack of information).
Information can indeed be communicated either as certain (known)when it is communicated as if it were true (regardless of whether it is actually true)or as uncertain (not known whether or believed)when it is communicated as if it were possibly true or possibly false (regardless of whether they are actually true or false). In other terms, certainty and uncertainty have to do with the speakers/writers' epistemic commitment towards the truth of the information, which can be therefore communicated as certain/probable/possible, etc.
Uncertainty, that in our perspective includes both possibility (as expressed, for example, by the epistemic use of modal verbs and expressions such as it is possible/probable, etc.) and subjectivity (i.e. the communication of the writers' point of views, such as the expressions in my opinion, according to my view, I think, etc.) is communicated through the aforementioned list of markers. Certainty, in its turn, is communicated through epistemically unqualified declarative sentences (Aijmer 1980), rhetorical questions, as well as through markers such as evidential and epistemic verbs (e.g. I/we know, I/we see, I/we remember.), verbal expressions (e.g. I/we have no doubts, I'm/we are convinced …), adverbs (e.g. certainly, surely.), adjectives (e.g. it is sure, certain …), etc. In other terms, certainty is communicated both by declarative sentences with any markers and by explicit markers of certainty, which functioncontrary to those of uncertaintyas markers of reinforcing (or boosting, by using Holmes' (1984) terminology). For example, in our view, in the whole following excerpt the author communicates certainty, even though in the last sentence it is boosted by the use of the adverb "certainly": There is a fever in the tropics (for want of a better name, I call it tropical fever) which possesses certain characteristics of its own. Akin to malarial fever, and also to enteric, it cannot correctly be designated by either name. It is certainly sporadic, and usually attacks adults. (Sherman-Bigg 1882, 607) Specifically, as for one of the three corpora mentioned abovethe same investigated in this very study, composed of 80 articles randomly selected from the BMJ from 1840 to 2007 and stratified in four distinct time periods (namely 1840-1880, 1881-1920, 1921-1960 and 1961-2007, see Section 3.1.1), our analysis revealed that: the percentage of certainty and uncertainty in the whole corpus is, respectively, 80% and 20%; although these percentages vary along the four periods, from 77% to 84% (that of certainty) and from 16% to 23% (that of uncertainty), such differences are not significant (Zuczkowski et al. 2016).
This means that certainty and uncertainty remain stable over the 167-year span, i.e. that the writers have been using uncertainty in similar quantities and always in a smaller percentage as compared to certainty.

The present study: Subjectivity UMs
In this study, we have focussed our attention on a specific type of UMs, which is the subjectivity uncertainty markers (SUMs). By using this term, we refer to those markers of uncertainty and/or subjectivity in which there is an explicit reference to the author/s through the use of personal pronouns (I, we, me, us), adjectives (our, my, mine, etc.), or adverbs (personally).
Particularly, out of the seven categories of UMs presented in the previous section, we were interested in investigating the uncertainty communicated by: (1) I/we modal verbs in the first singular and plural person; (2) I/we epistemic verbs in the first singular and plural person; and (3) epistemic non-verbs conveying personal opinions.
The main research questions were as follows: (1) Which and how many SUMs are present in the corpus?
(2) Are there any significant variations in their use over time?
(3) Which is their linguistic scope (i.e. their linguistic influence) and what is the total amount of uncertainty communicated by them?

Corpus
As we were interested in studying the use of SUMs in biomedical scientific writing from a historical perspective by hypothesising possible variations over time, we chose to analyse the biggest of our three corpora previously annotated for uncertainty, which is composed of 80 articles randomly selected from the BMJ, which cover a time span of 167 years .
In this analysis, we maintained the distinction in four time periods (a conventional differentiation based on significant phases both in history and medical evolution): 1. 20 articles published between 1840 and 1880. This is a fascinating period for medicine in that physiology and experimentation were beginning to come up in Europe. Among the most important medical discoveries, we can mention: the first uses of anaesthesia (Wells/Morton 1844/46), the prevention of the transmission of puerperal fever (Semmelweis 1847), the development of the syringe (Pravaz and Woo 1853), the foundation of cellular pathology (Virchow 1858), the laws of inheritance (Mendel 1865), and the use of antiseptic surgical methods (Lister 1867). 2. 20 articles published between 1881 and 1920. The period includes the turn of the century, with all the excitement about the apparent lack of boundaries regarding what science could do and the belief that it could predict everything. Among the most important medical findings, we can mention: the discovery of TB and cholera bacilli by Koch (1882, 1884), the first effective vaccine for rabies (Pasteur 1886), the discovery of antitoxins against tetanus and diphtheria (von Behring 1890), X rays (Röntgen 1895), radioactivity (Becquerel 1896), the synthesis and manufacture of aspirin (Bayer 1899), the system to classify blood (Landsteiner 1901), and the pioneering use of the ECG (Dudley 1913).
3. 20 articles published between 1921 and 1960. These are prevalently the times of war all over the world, with the bad and the good they brought to science. Among the most important medical discoveries, we can mention: the first use of insulin to treat diabetes (1922), the first vaccines for diphtheria (1923), tuberculosis and tetanus (1927) These 80 articles deal with different topics. Descriptions of diagnostic techniques, surgical and pharmacological treatments, and epidemiological studies all concerning prevalently physical diseases are the most frequent.
The random selection collected different types of articles, with a different distribution over the four periods, as shown in Table 1.
While the 40 articles of the first and second period are predominantly made up by Clinical cases and Reviews, those written in the third and fourth period are mostly Original research papers. This last type of article, unlike the previous ones, presents a more systematic structure.
As a matter of fact, in the third and fourth period 21 articles (11 for the third and 10 for the fourth) present the IMRaD (Introduction, Method, Results and Discussion) structure or something similar (Gross et al. 2002). These articles are Original research papers (19) or Clinical cases (2). The other 19 articles of these two periods are Reviews, Reports, and Letters and, obviously, do not have an IMRaD structure. In our analysis, we considered the IMRaD model also in its proto-versions, in which it is possible to observe a defined structure, even if not yet standardised. The first article presenting this proto-version of IMRaD structure belongs to the third period (E. Spriggs, The early recognition and treatment of cancer of the stomach, 1928), whereas the first article in our corpus with a well-defined IMRaD structure is dated 1958 (J. H. Burn and M. J. Rand, Action of nicotine on the heart); a standardised IMRaD structure seems established since 1968 (C. P. Lowther and R. W. D. Turner, Guanethidine in the treatment of hypertension, 1963).
The greater section of the 80 articles has a single author, with British English mother tongue (the native British English speaker status was assessed on the basis of the authors' last names and of institutional affiliation, as for Salager-Meyer 1999b). This applies to 19 articles of the first period, all the 20 articles of second, 14 articles of the third, but only 5 of the fourth, where groups of authors (in some cases not British) prevail.

Procedure and data analysis
Starting from Knowtator tags, we extracted from each article those UMs in which there was an explicit selfreference (by a personal pronoun, an adjective, or an adverb). On the basis of the seven categories described above (see Section 2), we created an excel document in which we inserted for each article, both SUMs and the string of text in which they appear, their scope and other relevant information about the article itself.
Out of the seven categories of UMs previously listed, SUMs appear for the following three: 1. I/we modal verbs in the first singular and plural person, in their epistemic (non-deontic) use, as in the following example in which we can notice the modal verb may in the first plural person (even if the article has a single author, "we" seems to be used for generically referring to the scientific community see Fløttum 2006Fløttum , 2012: (1) We may have underestimated the effect of REM sleep in this study as the decline overnight may partially result from an accumulative effect of periods of REM sleep throughout the night. (Shapiro et al. 1986, 1163) 2. I/we epistemic verbs in the first singular and plural person, as in the following example where two verbal expressions in the first singular person are present. The author, at first, presents a cautious hypothesis about the causes of the patient's recovery ("I am inclined to think"), and then he assumes a "not knowing whether" stance (Zuczkowski et al. 2017(Zuczkowski et al. , 2021 by expressing his uncertainty about the role of the milk as a medium for bacterial cultures ("I […] do not even know whether"), due to a lack of expertise in the field of bacteriology: (2) I am inclined to think that she owes her recovery to the dilution of the toxins by, and the envelopment of the pathogenic bacteria in, the large quantity of milk which was free in the peritoneal cavity. I am not a bacteriologist, and do not even know whether milk is a good culture medium for the various bacteria which one finds in the stomach and the upper part of the incision closed by suture. (Roper 1908, 786) Within this category, we distinguished the sub-category represented by the verbs to seem and to appear. Although they have an epistemic meaning with all the personal pronouns (I, you, he, she, it, we, they), as they always refer to the I/we of the author/s, we included in the SUM category only those with an explicit reference to the author(s) (i.e. seem/s or appear/s to me/to us), as in the following example. Here the author expresses his personal opinion in favour of the use of an antiseptic for preventing puerperal fever: (3) These results seem to me to promise a substantially greater margin of safety than heretofore, since the hands of the doctor and midwife (and also the vulva) treated with dettol or iodine will be provided with a lasting chemical barrier against infection. (Colebrook 1933, 726) 3. Epistemic non-verbs conveying personal opinions. In the following excerpt, we can observe two occurrences of this SUM category ("In my opinion" and "I am farther of opinion"), both aimed at explicitly expressing a personal point of view about the fecundation in human females: (4) In my opinion, therefore, it is not the ovum of a past menstrual period but a younger and more readily responsive ovum which is invariably fertilized, and I am farther of opinion that the extrusion of such from the ovary is determined largely, if not entirely, by the mere presence of vigorous spermatozoa in the female genital tract. (Oliver 1907(Oliver , 1568 In some cases, UMs in general and, specifically here, SUMs can occur in a cluster, as in the two following excerpts. In example (5), we see an epistemic verb ("I think") immediately followed by an epistemic non-verb conveying a personal opinion ("from my own experience"): the author's point of view is grounded on his personal clinical experience. Additionally, in the same excerpt, we can underline the presence of other expressions of the author's subjectivity (but without epistemic meanings), as "I am justified in advocating it" and "according to our present obstetric rule." (5) Notwithstanding the unfavourable aggregate results of the Caesarean section in Great Britain and Ireland, I think, from my own experience, shown in the above statements, I am justified in advocating it as an operation of election, not merely having recourse to it as one of necessity, according to our present obstetric rule, when no other means can suffice, but to give it a preference over the use of the crotchet, in cases when neither premature labour, the long forceps, these two operations combined, or turning, will meet the exigencies of the case. (Radford 1849, 460) In example (6), we have, at first, an epistemic verb ("I believe") and then a modal verb in the first plural person ("we may"). Furthermore, we can highlight the presence of the UM perhaps. The author expresses his cautious opinion about a probable erroneous scientific judgement about the use of potato in bread, mixed with wheaten flour, a mix thathe believesmost of the bakers use. His opinion is connected to the results of chemical studies (of other authors: he is a physician) demonstrating that potato, since it contains a vegetable acid, can help to counteract the onset of scurvy: (6) I believe very few bakers omit to use it, and perhaps we may have acted unwisely in condemning this adulteration. (Barrett 1848, 177) From these data, we performed a quantitative analysis by: (a) calculating frequencies and percentages of SUMs and their linguistic scope, both for the corpus taken as a whole and for each time period and (b) comparing, globally and for each time period, frequencies and percentages of UMs and their linguistic scope with frequencies and percentages of SUMs and their linguistic scope.

Results
As shown in the following sections, in line with our hypotheses, the quantitative analysis reveals a progressive decrease of SUMs over the four periods. In particular, it is possible to observe a gradual eclipse of the uncertainty communicated by the authors' self-mentions between the first two (1840-1920) and the second two (1921-2007) time periods.

Quantitative results: SUMs and Non-SUMs in the whole corpus and in each period
Out of the 80 articles of our corpus, 52 (65%) present at least one SUM: 15 out of 20 in the first period (75%), 17 in the second period (85%), 13 in the third period (65%), and 7 in the fourth period (35%). As shown in Table 2, out of the total number of UMs (2,808) present in the whole corpus, non-SUMs are 2,492 (88.75%) while SUMs are 316 (11.25%). Such difference is statistically significant (χ 2 (1, N = 2,808) = 1686.24, p < 0.0001) and extremely high: the ratio between non-SUMs and SUMs is indeed equal to 7.88, and this means that the presence of SUMs is, in our corpus, almost eight times less frequent than that of non-SUMs, suggesting that scientific writers prefer to communicate uncertainty without resorting to explicit self-mentions.
The histogram in Figure 1 graphically shows the inverse relation between SUMs and non-SUMs along the four periods. Where the non-SUMs increase, the SUMs decrease, until they almost disappear from the biomedical articles, in the final period.

Quantitative results: SUMs in the whole corpus and in each period
Specifically, as for SUMs, the quantitative analysis reveals that in the whole corpus the most numerous category is that constituted by I/we modal verbs followed by that made up of I/we epistemic verbs (Table 4). Taken together, they represent 80% of the total SUMs in the whole corpus. Similarly, epistemic non-verbs conveying personal opinions and the subcategory of the verbs "to seem" and "to appear" represent only 20% of the total SUMs in the whole corpus. This indicates that when scientific writers have to communicate   uncertainty by using explicit self-mentions, they seem to prefer to resort to modal and epistemic verbs rather than to epistemic non-verbs conveying personal opinions and to verbs such as seem(s) to me/us and appear(s) to me/us. If we consider separately each SUM category, their differences result statistically significant (χ 2 (3, N = 316) = 278,3, p < 0.0001).
Taking into consideration the four time periods separately (Table 5), the analysis reveals the following: in the fourth, although the total amount is very low for each category, the most used SUMs are the epistemic non-verbs conveying personal opinions (5 occurrences out of 13 SUMs, which represent 38.46% of the total).
In the following sections, we present the results of the quantitative analysis concerning each identified SUM category, both regarding the whole corpus and each time sub-corpus.  I/we modal verbs in the whole corpus. Out of I/we modal verbs category (which represents 43.04% of the total SUMs), as shown in Table 6, the most used is the subcategory I/we may (34.56%). If we add to I/we may also I/we can (47 + 24), the percentage increases, reaching over 50% of the total.
I/we modal verbs for each period. The use of I/we modal verbs displays ( Table 7) a fluctuating trend over time: it increases from the first (24.26%) to the second period (39.71%) and decreases from the second to third (33.09%) until it almost disappears in the fourth (2.94%).
The most frequent I/we epistemic verbs in the whole corpus. As shown in Table 8, out of the 117 occurrences of epistemic verbs in the first singular and plural person, those that are more frequently used are in the order "I/we think" (and similar expressions, like as "I am inclined to think," "we are bid to think," "I venture to think," and so on) and "I/we believe" (and similar expressions, such as "we are every reason to believe," and "we do not believe"). Taken together, they represent almost 70% of the total I/we epistemic verbs. Other verbs and verbal expressions, such as "I am not quite sure" have been labelled as "Other verbs." I/we epistemic verbs for each period. As shown in Table 9, the I/we epistemic verbs decrease progressively over time, ranging from 43.59% in the first period to 22.22% in the third one. In the fourth period, they almost disappear.
The most frequent expressions of personal opinions in the whole corpus. As shown in Table 10, out of 49 occurrences of epistemic non-verbs conveying personal opinions, the most used are "in my opinion" (10 =   20.41%) and "in my/our experience" (7 = 14.29%), followed by "personally"³ (5 = 10.20%) and "my (own) opinion" (4 = 8.16%). Other epistemic non-verbs, such as "to my mind," "in our judgment," etc., have been labelled as "Other expressions." Expressions of personal opinions for each period. The use of epistemic non-verbs conveying personal opinions displays (Table 11) a fluctuating trend over time. It increases enormously from the first period to the second (from 18.37% to 38.78%), then it diminishes from the second to third (from 38.78% to 32.65%) and from the third to fourth (moving from 32.65% to 10.20%).
Seem(s)/Appear(s) to me/to us verbs in the whole corpus. Finally, the verbs "seem(s) to me/us" and "appear(s)" are the least used category of SUMs in the whole corpus (Table 12).
Seem(s)/Appear(s) to me/to us verbs for each period. The verbs "seem(s) to me/us" and "appear(s)" increase slightly during the first three periods although their presence remains very low (three, four, and seven occurrences, respectively), and then they completely disappear in the fourth period (Table 13).
"I" vs "we" Table 14 shows that out of the 316 SUMs present in 52 articles of our corpus: -193 refer to the first singular person (e.g. "I think," "seems to me," etc.); -123 refer to the first plural person (e.g. "we believe," "seem to us," etc.).   Additionally, it is interesting to note that "we" is used, as a rhetorical device, also when the articles are written by a single author, such as in the first three periods.

Scope
Analogous to what has been observed previously for the UMs, it is possible to notice that the amount of uncertainty (UMs + their linguistic scope) communicated by SUMs is always lower than the amount of uncertainty communicated by non-SUMs both in the whole corpus and in each time period.
Out of the total amount of uncertainty present in the whole corpus, the percentage of uncertainty communicated by SUMs is 15.97%, while that communicated by non-SUMs is 84.03% (Table 15).
Although such difference changes over the four time periods, in each of them the amount of uncertainty communicated by SUMs is always lower than that communicated by non-SUMs.
Specifically, the uncertainty communicated by SUMs ranges from: -22.19% in the first period to -19.79% in the second one to -14.64% in the third to -3.32% in the final one.
Similarly, the uncertainty communicated by non-SUMs ranges inversely from: -77.81% in the first period to -80.21% in the second one to -85.36% in the third to -96.68% in the final one.
In other words, there is a progressive disappearance of the uncertainty communicated by the first person, which is in favour of a progressive increase of uncertainty communicated without resorting to authors' self-mentions.
Specifically, SUMs in the first period are as follows: -1.12 times higher than those present in the second (first period SUM to second period SUM ratio); -1.51 times higher than those present in the third (first period SUM to third period SUM ratio); -6.69 times higher than those present in the fourth (first period SUM to fourth period SUM ratio).
Two possible variables that could be taken into account for explaining these results are the types of articles and a more progressively stable use of the IMRaD structure.

SUMs in different types of articles
As for the types of articles, as mentioned before, due to the random selection, our corpus appears rather diversified. If we take into consideration such a variable, we can observe interesting differences in the amount of uncertainty communicated by SUMs and by non-SUMs (Table 16). Lectures are indeed the type of article in which the percentage of uncertainty communicated by SUMs is the highest (28.86%); original research papers are, on the other hand, the type of article in which the percentage of uncertainty communicated by non-SUMs (i.e. without resort to self-mention) is the highest (89.28%).
If we compare the percentage of SUMs in the Original research papers with those present in Lectures and Clinical cases, we obtain that they are: -2.69 times lower than those present in the Lectures (Lecture SUM to Original research paper SUM ratio); -2.10 times lower than those present in the Clinical cases (Clinical case SUM to Original research paper SUM ratio).
Although, as claimed above, there are interesting differences in the amount of uncertainty communicated by SUMs and by non-SUMs in different types of articles, nonetheless, the uncertainty communicated by non-SUMs is always greater than that communicated by SUMs. Also in this case, the difference between them (i.e. uncertainty communicated by SUMs and by non-SUMs) is statistically significant. Specifically: Clinical cases: χ 2 (1, N = 6,631) = 2003.6, p < 0.0001; Lectures: χ 2 (1, N = 3,974) = 710.21, p < 0.0001; Letters: χ 2 (1, N = 371) = 189.28, p < 0.0001;

SUMs in IMRaD articles' third and fourth periods
For the second variable, i.e. the IMRaD vs non-IMRaD structure of the articles, as mentioned above, our corpus is also rather diversified: out of the 80 articles, 21 (11 in the third period and 10 in the fourth one) have an IMRaD structure (or similar), while 59 do not present such structure.
Taking into account such variable, it is possible to observe some differences in the amount of uncertainty communicated by SUMs and non-SUMs (Table 17).
Furthermore, the following should be noted: the percentage of uncertainty communicated by SUMs in the whole corpus is 15.97% of the total uncertainty (i.e. the uncertainty communicated by SUMs and by non-SUMs); the uncertainty communicated by SUMs in the 59 articles without IMRaD structure is 18.32% (i.e. almost three-point percentage higher than that in the whole corpus) of the total uncertainty; the uncertainty communicated by SUMs in the 21 IMRaD articles (in third and fourth periods) is 10.39% (i.e. almost six-point percentage lower than that in the whole corpus) of the total uncertainty present in this sub-corpus.
This means that SUMs in IMRaD articles are 1.76 times lower than those present in non-IMRaD articles (non-IMRaD SUM to original IMRaD SUM ratio).

Discussion and conclusion
This study is a part of a wider research project aiming at exploring the communication of certainty and uncertainty in medical articles from a diachronic perspective. The main results of this project regarded both theoretical and practical contributions. The latter concern the construction and training of an algorithm for the automatic detection of UMs for the English language (Omero et al. 2020). The purpose of the current study is mainly to describe the variations over time in the use of subjectivity UMs in a diachronic corpus of medical articles, aiming at contributing to the research on diachronic variation in scientific writing, particularly regarding hedging and subjectivity strategies.
Scientific writing is a symbolic and rhetorical practice, historically and socially constructed, which evolves over time (Bazerman 1988).
"There is no 'faceless' writing, and all stance choices are important rhetorical decisions that affect how the message is received and the ways readers react to a text" (Hyland and Jiang 2016, 258).
The results of our investigation, aiming at focussing on those markers of subjectivity by which the authors communicate uncertainty, are consistent with other diachronic studies on the expressions of the authors' subjectivity in medical articles showing a progressive decrease of self-mentions (e.g. Atkinson 1992Atkinson , 1996. Several other non-diachronic studies have highlighted this trend, which seems to be related to an increase in the use of passive voice (e.g. Amdur et al. 2010, Hyland 2001, Rundblad 2007, Segal 1993. For that matter, as Hyland and Jiang (2016, 265) claim, in the sciences in general, it is common, consolidated practice for writers to downplay their personal role to highlight the phenomena under study, the replicability of research activities, and the generality of the findings, subordinating their own voice to that of unmediated nature. Such a strategy subtly conveys an empiricist ideology that suggests research outcomes would be the same irrespective of the individual conducting it.
As, to the best of our knowledge, there are neither studies that deal specifically with hedging functions of markers of subjectivity focused on diachronic medical corpora, it is difficult to perform specific comparisons.
However, our results seem, on the one hand, to be consistent with both those by Hyland (1998, 364), according to whom there is a "predominant view of science as an impersonal, inductive enterprise," and to some extent, those by Hyland and Jiang (2016) who, although by analysing non-medical corpora, highlight a progressive decrease (from 1965 to 2015) in the use of both hedging and author/s' self-mentions, surprisingly, more evident in the soft fields and, particularly in applied linguistics, than in hard sciences (as electronic engineering and biology): they try to explain this trend with the increasing movement in these soft fields toward more an "author-evacuated" prose, which mimic hard science practices and goes together with the current orientation toward more "objective," empirically grounded and quantitative approaches. Nonetheless, we can mention a recent work by Poole et al. (2019), based on the analysis of a diachronic corpus of articles  pertaining to a non-medical hard field (biochemical research), whose results diverge from those by Hyland and Jiang (2016). This study reveals a consistent decrease in the use of epistemic stance items indexing uncertainty and an increase in boosters, but the authors explain this trend associating it with the peculiarity of the corpus analysed: not a generic collection of biochemical articles, but a specific selection based on a specific common topic (chemotaxis). In their view, high levels of confidence and certainty are connected to the specialised corpus and they conclude that "author presence as reflected in epistemic stance features becomes less overt as a discipline adopts a shared understanding of a phenomenon" (Poole et al. 2019, 9).
On the other hand, our results seem to diverge from those by Millar et al. (2013), who noted a fewer use of passive constructions in a corpus of randomised control trials taken from BMJ (2005), rather than in other medical journals, probably due to the BMJ guidelines (https://www.bmj.com/about-bmj/resourcesauthors/house-style, last accessed on July 2019), which recommend the usage of the active voice and the first person where necessary. This does not seem to be valid for SUMs. Indeed, the main results of our study show that in the articles of BMJ that we have analysed, SUMs decrease over time, until they progressively almost disappear.
The overall results of our investigation, particularly, reveal that: a) SUMs are less used than non-SUMs in the whole corpus and in the articles referring to each period; b) SUMs diminish over time; c) among SUMs the most used are I/we modal verbs followed by I/we epistemic verbs, but they decrease in the third and almost disappear in the fourth one; d) although SUMs in the first singular person are globally more numerous than SUMs in the first plural person, their proportion changes in the third and fourth periods (although in the third period almost all the articles are written by a single author); e) the quantity of uncertainty communicated by SUMs in IMRaD article subcorpus (referring to third and fourth periods) is lower than that communicated by SUMs in non-IMRaD article subcorpus (referring to third and fourth periods).
Even though it is impossible to identify with certainty all the variables responsible for the observed progressive eclipse of SUMs, it is reasonable to suppose a role for the following ones.
The evolution in medical research and practice. During the timespan under consideration, the observations made by a single doctor who treated a single case (or a small number of cases) have been progressively replaced by laboratory-based experimental studies, epidemiological studies, cluster randomised trials, etc., performed by research teams (sometimes heterogeneous, regarding specific competencies, specialisations, nationality, etc.). At the same time, new and advanced methods and technologies have been developed (regarding research tools, diagnosis, treatments, etc.), and this allowed scholars to reach more and more objectivity in their results and to limit subjectivity.
The evolution in medical scientific writing. Changes in medical research and practice have clearly influenced medical written communication. Top-rated medical journals have gradually given attention to different types of articles. If until the latter half of the nineteenth century a great part of the articles were case reports, presented by the authors (mostly single authors) using a personal, narrative style, things gradually changed. Indeed, it is possible to observe the progressive diminishing of case studies by single authors in favour of articles reporting results of more and more complex and highly specialised studies, conducted by teams of medical researchers. In line with these changes in types of studies and articles, the rhetorical style used in their communication has also changed: non-narrative, but descriptive, non-subjective, but objective, and increasingly more structured.
We should consider that the American National Standards Institute formally defined the IMRAD structure only in 1979. As Skelton (1997) claims, the set of constraints on the presentation of research in medical journals was published, for the first time, by four different medical journals (among which BMJ and The Lancet) in 1979 with the title "International Steering Committee. Uniform requirements for manuscripts submitted to biomedical journals." These recommendations, known as Vancouver Style (the place where the Committee first effectively came together in 1978), consist of a set of "uniform technical requirements," concerning ethical procedures to be followed, how a bibliography should be presented, the elements of the structure. The fourth version of these recommendations was published in the BMJ by the International Committee of Medical Journal (1991), where the division in sections with the headings Introduction, Methods, Results and Discussion for observational and experimental articles was defined.
Clearly, scientific, epistemological, and social changes affect the styles of writing over time (Atkinson 1992(Atkinson , 1996. In this study, in order to assess the progressive decreasing of SUMs, we focussed our attention particularly on two variables: the types of articles and the presence of IMRaD structure. Different types of articles resort differently to SUMs in order to communicate the writers' commitment towards the propositional contents. Specifically, our analysis reveals that, as expected, SUMs are more numerous in Lectures and Clinical cases rather than in Original research papers, where data and findings are conveyed in a more objective way, especially in the fourth period. In other words, when communicating their uncertainty, authors use markers of subjectivity above all in Letters and Clinical cases (types of articles prevalently gathered in the first two periods).
Analogously, the IMRaD structure (present in our corpus only since the third period, 1921-1960) seems also to affect the use of SUMs. In other terms, IMRaD articles have a lower number of SUMs. SUMs, as well as UMs, as expectable, occur more frequently in Introduction and Discussion. These results are consistent with those by Skelton (1997) and Skelton and Edwards (2000), and also with those emerging from a recent study by Keramati et al. (2019), who specifically analysed stance and engagement markers in a diachronic corpus of articles (1996-2016) characterised by IMRaD structure, extracted from three leading journals in the field of applied linguistics. The authors observe a significant decrease in the use of self-mention in the Method section (that might be connected to a more hard science orientation of researchers even in the applied linguistic domain), but also its massive rise in the Introduction, by interpreting this result as evidence of the development of a promotional and consumer-oriented discourse.
Our previous analyses , Zuczkowski et al. 2016) reveal that uncertainty decreases over time although not in a significant way. Within this diminishing uncertainty, that communicated by self-mentions (i.e. by SUMs) similarly decreases.
Author contributions: Conceptualisation: IR, RB and AZ; methodology: IR and RB; data analysis IR and RB; writing (original draft preparation): IR and RB; writing (review and editing): IR, RB, AZ. All authors have accepted responsibility for the entire content of this manuscript and approved its submission.

Conflict of interest:
Authors state no conflict of interest.